Highlights
- Pro
Stars
An Open-source Toolkit for LLM Development
Awesome papers & datasets specifically focused on long-term videos.
Official repo of VLABench, a large scale benchmark designed for fairly evaluating VLA, Embodied Agent, and VLMs.
Benchmarking Knowledge Transfer in Lifelong Robot Learning
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
Code for the paper "AutoPresent: Designing Structured Visuals From Scratch"
A bibliography and survey of the papers surrounding o1
Official repo and evaluation implementation of VSI-Bench
A generative world for general-purpose robotics & embodied AI learning.
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
LaTeX package and annotated examples for annotating equations using TikZ.
[NeurIPS 2024] SceneCraft: Layout-Guided 3D Scene Generation
Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs
[CVPR2024] Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion
[ICLR'24 Spotlight] Uni3D: 3D Visual Representation from BAAI
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models