Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.

Python 324 19 Updated Feb 17, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,893 216 Updated Mar 4, 2025

stepfun-ai / Step-Audio

Python 4,023 322 Updated Mar 12, 2025

stepfun-ai / Step-Video-T2V

Python 2,687 235 Updated Mar 17, 2025

Saiyan-World / goku

Video Generation Foundation Models: https://saiyan-world.github.io/goku/

Python 2,735 288 Updated Feb 19, 2025

LTH14 / mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,368 77 Updated Sep 27, 2024

Jiayi-Pan / TinyZero

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 11,281 1,430 Updated Mar 10, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 23,095 2,100 Updated Mar 20, 2025

huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 9,508 863 Updated Mar 18, 2025

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,806 2,202 Updated Feb 1, 2025

deepseek-ai / DeepSeek-R1

87,028 11,236 Updated Feb 24, 2025

ByteFlow-AI / TokenFlow

[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".

Python 290 1 Updated Mar 5, 2025

lxtGH / OMG-Seg

OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]

Python 1,258 46 Updated Dec 11, 2024

NVIDIA / Cosmos

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Jupyter Notebook 7,748 500 Updated Mar 20, 2025

Xiangtai Li lxtGH

Highlights

Lists (3)

🔮 Future ideas

✨ Inspiration

🚀 My stack

Stars