FDInSky

FDInSky

8 followers · 14 following

Baidu
BeiJing

Achievements

Starred repositories

ByteFlow-AI / TokenFlow

🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".

Python 209 1 Updated Dec 28, 2024

yang-zj1026 / legged-loco

Low-level locomotion policy training in Isaac Lab

Python 64 4 Updated Dec 15, 2024

NVlabs / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Python 2,257 189 Updated Dec 30, 2024

AILab-CVC / SEED-X

Multimodal Models in Real World

Jupyter Notebook 423 19 Updated Oct 28, 2024

E2E-AD / AD-MLP

Python 199 16 Updated Dec 20, 2023

SkyworkAI / Vitron

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Python 448 25 Updated Oct 20, 2024

YvanYin / DrivingWorld

Code for "DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT"

Python 83 4 Updated Jan 3, 2025

FoundationVision / Liquid

Liquid: Language Models are Scalable Multi-modal Generators

55 Updated Dec 12, 2024

Liang-ZX / SkillDiffuser

[CVPR'2024] "SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution"

Python 57 3 Updated Sep 29, 2024

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 6,633 519 Updated Dec 25, 2024

Robot-K / Hint-AD

CoRL2024 | Hint-AD: Holistically Aligned Interpretability for End-to-End Autonomous Driving

Python 48 1 Updated Oct 30, 2024

opendilab / DI-engine

OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.

Python 3,172 381 Updated Dec 23, 2024

microsoft / VidTok

a family of versatile and state-of-the-art video tokenizers.

Python 307 19 Updated Dec 29, 2024

thunlp / LLaVA-UHD

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Python 345 15 Updated Dec 24, 2024

RoboUniview / RoboMM

Python 52 7 Updated Jan 2, 2025

mit-han-lab / vila-u

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Python 191 3 Updated Oct 24, 2024

lucidrains / transfusion-pytorch

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 868 34 Updated Dec 29, 2024

Robot-VLAs / RoboVLMs

Python 186 6 Updated Dec 29, 2024

EDiRobotics / mimictest

A simple testbed for robotics manipulation policies

Python 70 3 Updated Dec 5, 2024

rhymes-ai / Aria

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook 954 79 Updated Dec 18, 2024

EnVision-Research / DriveRecon

Python 64 1 Updated Dec 30, 2024

RunpeiDong / DreamLLM

[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation

Python 407 7 Updated Dec 2, 2024

taco-group / OpenEMMA

OpenEMMA, a permissively licensed open source reproduction of Waymo’s EMMA model.

Python 344 38 Updated Jan 3, 2025

LaVi-Lab / Video-3D-LLM

The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.

Python 32 2 Updated Dec 30, 2024

intuitive-robots / MoDE_Diffusion_Policy

Code for "Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning"

C++ 20 1 Updated Jan 2, 2025

intuitive-robots / mdt_policy

[RSS 2024] Code for "Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals" for CALVIN experiments with pre-trained weights

C++ 84 9 Updated Oct 16, 2024

EDiRobotics / GR1-Training

Reimplementation of GR-1, a generalized policy for robotics manipulation.

Python 111 3 Updated Sep 4, 2024

Genesis-Embodied-AI / Genesis

A generative world for general-purpose robotics & embodied AI learning.

Python 21,426 1,690 Updated Jan 3, 2025

mbreuss / diffusion-literature-for-robotics

Summary of key papers and blogs about diffusion models to learn about the topic. Detailed list of all published diffusion robotics papers.

728 39 Updated Sep 20, 2024

AnjieCheng / SpatialRGPT

[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"

Python 84 7 Updated Dec 14, 2024

FDInSky

Starred repositories

anchor-free

detectron2