Skip to content
View cuijh26's full-sized avatar
🤗
🤗

Block or report cuijh26

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Python 479 28 Updated Oct 20, 2024
Python 9 1 Updated Feb 5, 2025

The official implementation of paper "ColorFlow: Retrieval-Augmented Image Sequence Colorization"

Python 355 29 Updated Dec 23, 2024

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python 2,385 180 Updated Feb 7, 2025

[ICLR 2025] Reconstructive Visual Instruction Tuning

Python 52 2 Updated Jan 23, 2025

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Python 3,410 343 Updated Nov 3, 2024

DiT for VAE (and Video Generation)

Python 32 3 Updated Sep 2, 2024

Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image"

Python 288 18 Updated Dec 23, 2024

VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model

Python 93 5 Updated Feb 9, 2025

A fork to add multimodal model training to open-r1

Python 502 26 Updated Feb 8, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 15,424 2,014 Updated Feb 1, 2025

[ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Python 48 6 Updated Feb 10, 2025

[NeurIPS 2024] Generalizable and Animatable Gaussian Head Avatar

Python 416 37 Updated Nov 29, 2024

[ArXiv 2024] X-Dyna: Expressive Dynamic Human Image Animation

Python 150 12 Updated Jan 30, 2025

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 936 41 Updated Feb 1, 2025

FastVideo is a lightweight framework for accelerating large video diffusion models.

Python 980 59 Updated Feb 7, 2025
Python 83 4 Updated Jul 8, 2024

Official implementation of SVFR.

Python 699 65 Updated Jan 19, 2025

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,266 68 Updated Sep 27, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 302 36 Updated Aug 15, 2024

Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers

102 3 Updated Jan 13, 2025

Blending Custom Photos with Video Diffusion Transformers

Python 42 1 Updated Jan 21, 2025

An 8-step inversion and 8-step editing process works effectively with the FLUX-dev model. (3x speedup with results that are comparable or even superior to baseline methods)

Python 222 12 Updated Jan 25, 2025

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Python 7,413 464 Updated Jan 28, 2025

[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

338 9 Updated Jan 17, 2025

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 11,490 1,148 Updated Feb 3, 2025

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

Python 9,982 736 Updated Dec 4, 2024

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks

Python 1,022 135 Updated Jan 29, 2025

Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

Python 3,466 502 Updated Jan 24, 2025
Next