- Singapore
- https://lxtgh.github.io/
- @xtl994
Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Stars
Paper List of Inference/Test Time Scaling/Computing
(TPAMI 2024) A Survey on Open Vocabulary Learning
[T-PAMI-2024] Transformer-Based Visual Segmentation: A Survey
This is a repo to track the latest autoregressive visual generation papers.
Fast and memory-efficient exact attention
Implementation of [CVPR 2025] "DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation"
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
A Unified Tokenizer for Visual Generation and Understanding
[ICLR 2025] Official Implementation of Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Open reproduction of MUSE for fast text2image generation.
Wan: Open and Advanced Large-Scale Video Generative Models
PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437
Official Repo for Open-Reasoner-Zero
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Video Generation Foundation Models: https://saiyan-world.github.io/goku/
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Fully open reproduction of DeepSeek-R1
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Janus-Series: Unified Multimodal Understanding and Generation Models
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…