wjn922

🎯

Focusing

Jonas Wu wjn922

🎯

Focusing

The University of Hong Kong. PhD student. Computer Vision. Multimodal Models.

85 followers · 67 following

The University of Hong Kong
China
https://scholar.google.com/citations?hl=zh-CN&user=1euA66EAAAAJ&view_op=list_works&sortby=pubdate

Achievements

x2 x2

Achievements

x2 x2

Stars

facebookresearch / perception_models

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 536 18 Updated Apr 22, 2025

SandAI-org / MAGI-1

MAGI-1: Autoregressive Video Generation at Scale

Python 1,902 73 Updated Apr 23, 2025

MoonshotAI / Kimi-VL

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

769 35 Updated Apr 20, 2025

hiyouga / EasyR1

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 2,128 152 Updated Apr 23, 2025

Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

660 34 Updated Apr 22, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 7,030 768 Updated Apr 23, 2025

stepfun-ai / Step-Video-T2V

Python 2,852 259 Updated Mar 17, 2025

FoundationVision / UniTok

A Unified Tokenizer for Visual Generation and Understanding

Python 261 5 Updated Apr 15, 2025

Saiyan-World / goku

Video Generation Foundation Models: https://saiyan-world.github.io/goku/

Python 2,807 298 Updated Feb 19, 2025

NVIDIA / Cosmos

New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos

Jupyter Notebook 7,932 510 Updated Apr 2, 2025

Genesis-Embodied-AI / Genesis

A generative world for general-purpose robotics & embodied AI learning.

Python 24,823 2,181 Updated Apr 22, 2025

open-compass / VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 2,259 339 Updated Apr 23, 2025

webdataset / webdataset

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Python 2,566 203 Updated Feb 12, 2025

Tencent / HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 9,760 844 Updated Apr 18, 2025

TencentARC / FluxKits

Python 84 5 Updated Nov 27, 2024

LLaVA-VL / LLaVA-NeXT

Python 3,721 347 Updated Feb 24, 2025

QwenLM / Qwen2.5-VL

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 9,954 702 Updated Apr 23, 2025

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,900 197 Updated Apr 19, 2025

Alpha-VLLM / Lumina-mGPT

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"

Python 588 27 Updated Apr 1, 2025

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 15,110 1,668 Updated Dec 25, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 45,607 7,031 Updated Apr 23, 2025

TencentARC / SEED-Voken

SEED-Voken: A Series of Powerful Visual Tokenizers

Python 868 31 Updated Feb 19, 2025

Surrey-UP-Lab / GS-LPM

Localized Gaussian Point Management

Python 61 3 Updated Mar 7, 2025

OpenGVLab / OmniCorpus

[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Python 339 7 Updated Mar 20, 2025

FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,719 75 Updated Aug 15, 2024

mulanai / MuLan

MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)

Python 136 3 Updated Jan 24, 2025

Alpha-VLLM / Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 2,184 92 Updated Feb 16, 2025

FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python 563 44 Updated Jun 7, 2024

FoundationVision / VAR

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 7,600 472 Updated Mar 22, 2025

FoundationVision / GenerateU

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Python 167 7 Updated Mar 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jonas Wu wjn922

Achievements

Achievements

Block or report wjn922

Stars

facebookresearch / perception_models

SandAI-org / MAGI-1

MoonshotAI / Kimi-VL

hiyouga / EasyR1

Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs

volcengine / verl

stepfun-ai / Step-Video-T2V

FoundationVision / UniTok

Saiyan-World / goku

NVIDIA / Cosmos

Genesis-Embodied-AI / Genesis

open-compass / VLMEvalKit

webdataset / webdataset

Tencent / HunyuanVideo

TencentARC / FluxKits

LLaVA-VL / LLaVA-NeXT

QwenLM / Qwen2.5-VL

ictnlp / LLaMA-Omni

Alpha-VLLM / Lumina-mGPT

facebookresearch / sam2

vllm-project / vllm

TencentARC / SEED-Voken

Surrey-UP-Lab / GS-LPM

OpenGVLab / OmniCorpus

FoundationVision / LlamaGen

mulanai / MuLan

Alpha-VLLM / Lumina-T2X

FoundationVision / Groma

FoundationVision / VAR

FoundationVision / GenerateU