Skip to content
View wjn922's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report wjn922

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 536 18 Updated Apr 22, 2025

MAGI-1: Autoregressive Video Generation at Scale

Python 1,902 73 Updated Apr 23, 2025

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

769 35 Updated Apr 20, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 2,128 152 Updated Apr 23, 2025

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

660 34 Updated Apr 22, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 7,030 768 Updated Apr 23, 2025

A Unified Tokenizer for Visual Generation and Understanding

Python 261 5 Updated Apr 15, 2025

Video Generation Foundation Models: https://saiyan-world.github.io/goku/

Python 2,807 298 Updated Feb 19, 2025

New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos

Jupyter Notebook 7,932 510 Updated Apr 2, 2025

A generative world for general-purpose robotics & embodied AI learning.

Python 24,823 2,181 Updated Apr 22, 2025

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 2,259 339 Updated Apr 23, 2025

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Python 2,566 203 Updated Feb 12, 2025

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 9,760 844 Updated Apr 18, 2025
Python 84 5 Updated Nov 27, 2024
Python 3,721 347 Updated Feb 24, 2025

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 9,954 702 Updated Apr 23, 2025

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,900 197 Updated Apr 19, 2025

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"

Python 588 27 Updated Apr 1, 2025

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 15,110 1,668 Updated Dec 25, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 45,607 7,031 Updated Apr 23, 2025

SEED-Voken: A Series of Powerful Visual Tokenizers

Python 868 31 Updated Feb 19, 2025

Localized Gaussian Point Management

Python 61 3 Updated Mar 7, 2025

[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Python 339 7 Updated Mar 20, 2025

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,719 75 Updated Aug 15, 2024

MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)

Python 136 3 Updated Jan 24, 2025

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 2,184 92 Updated Feb 16, 2025

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python 563 44 Updated Jun 7, 2024

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 7,600 472 Updated Mar 22, 2025

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Python 167 7 Updated Mar 29, 2025
Next