Skip to content
View lxtGH's full-sized avatar
💬
At home
💬
At home

Highlights

  • Pro

Block or report lxtGH

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Paper List of Inference/Test Time Scaling/Computing

Python 112 3 Updated Mar 19, 2025

(TPAMI 2024) A Survey on Open Vocabulary Learning

904 51 Updated Mar 11, 2025

[T-PAMI-2024] Transformer-Based Visual Segmentation: A Survey

736 51 Updated Aug 25, 2024

This is a repo to track the latest autoregressive visual generation papers.

168 1 Updated Mar 20, 2025

Fast and memory-efficient exact attention

Python 16,432 1,549 Updated Mar 15, 2025

Implementation of [CVPR 2025] "DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation"

Python 727 60 Updated Feb 5, 2025

HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo

Python 1,140 91 Updated Mar 20, 2025

CogView4, CogView3-Plus and CogView3(ECCV 2024)

Python 940 65 Updated Mar 21, 2025

A Unified Tokenizer for Visual Generation and Understanding

Python 208 5 Updated Mar 3, 2025

[ICLR 2025] Official Implementation of Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Python 292 10 Updated Mar 20, 2025

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Python 980 63 Updated Mar 19, 2025

Open reproduction of MUSE for fast text2image generation.

Python 347 29 Updated Jun 1, 2024

Wan: Open and Advanced Large-Scale Video Generative Models

Python 8,795 940 Updated Mar 20, 2025

PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437

Python 971 49 Updated Feb 25, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,347 805 Updated Mar 1, 2025

Official Repo for Open-Reasoner-Zero

Python 1,658 78 Updated Mar 5, 2025

Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.

Python 324 19 Updated Feb 17, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,893 216 Updated Mar 4, 2025
Python 4,023 322 Updated Mar 12, 2025

Video Generation Foundation Models: https://saiyan-world.github.io/goku/

Python 2,735 288 Updated Feb 19, 2025

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,368 77 Updated Sep 27, 2024

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 11,281 1,430 Updated Mar 10, 2025

Fully open reproduction of DeepSeek-R1

Python 23,095 2,100 Updated Mar 20, 2025

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 9,508 863 Updated Mar 18, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,806 2,202 Updated Feb 1, 2025

[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".

Python 290 1 Updated Mar 5, 2025

OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]

Python 1,258 46 Updated Dec 11, 2024

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Jupyter Notebook 7,748 500 Updated Mar 20, 2025
Next