Skip to content
View FDInSky's full-sized avatar

Block or report FDInSky

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".

Python 209 1 Updated Dec 28, 2024

Low-level locomotion policy training in Isaac Lab

Python 64 4 Updated Dec 15, 2024

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Python 2,257 189 Updated Dec 30, 2024

Multimodal Models in Real World

Jupyter Notebook 423 19 Updated Oct 28, 2024
Python 199 16 Updated Dec 20, 2023

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Python 448 25 Updated Oct 20, 2024

Code for "DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT"

Python 83 4 Updated Jan 3, 2025

Liquid: Language Models are Scalable Multi-modal Generators

55 Updated Dec 12, 2024

[CVPR'2024] "SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution"

Python 57 3 Updated Sep 29, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 6,633 519 Updated Dec 25, 2024

CoRL2024 | Hint-AD: Holistically Aligned Interpretability for End-to-End Autonomous Driving

Python 48 1 Updated Oct 30, 2024

OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.

Python 3,172 381 Updated Dec 23, 2024

a family of versatile and state-of-the-art video tokenizers.

Python 307 19 Updated Dec 29, 2024

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Python 345 15 Updated Dec 24, 2024
Python 52 7 Updated Jan 2, 2025

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Python 191 3 Updated Oct 24, 2024

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 868 34 Updated Dec 29, 2024
Python 186 6 Updated Dec 29, 2024

A simple testbed for robotics manipulation policies

Python 70 3 Updated Dec 5, 2024

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook 954 79 Updated Dec 18, 2024
Python 64 1 Updated Dec 30, 2024

[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation

Python 407 7 Updated Dec 2, 2024

OpenEMMA, a permissively licensed open source reproduction of Waymo’s EMMA model.

Python 344 38 Updated Jan 3, 2025

The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.

Python 32 2 Updated Dec 30, 2024

Code for "Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning"

C++ 20 1 Updated Jan 2, 2025

[RSS 2024] Code for "Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals" for CALVIN experiments with pre-trained weights

C++ 84 9 Updated Oct 16, 2024

Reimplementation of GR-1, a generalized policy for robotics manipulation.

Python 111 3 Updated Sep 4, 2024

A generative world for general-purpose robotics & embodied AI learning.

Python 21,426 1,690 Updated Jan 3, 2025

Summary of key papers and blogs about diffusion models to learn about the topic. Detailed list of all published diffusion robotics papers.

728 39 Updated Sep 20, 2024

[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"

Python 84 7 Updated Dec 14, 2024
Next