exiawsh

Follow

exiawsh

Follow

33 followers · 19 following

Achievements

Achievements

Stars

VITA-Group / Diffusion4D

"Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models", Hanwen Liang*, Yuyang Yin*, Dejia Xu, Hanxue Liang, Zhangyang Wang, Konstantinos N. Plataniotis, Yao Zhao, …

Python 261 5 Updated Jan 21, 2025

a-r-r-o-w / finetrainers

Memory-optimized training scripts for video models based on Diffusers

Python 760 79 Updated Jan 21, 2025

KwaiVGI / Koala-36M

Official implementation of the paper "Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content".

Python 133 4 Updated Nov 8, 2024

genmoai / mochi

The best OSS video generation models

Python 2,747 282 Updated Jan 8, 2025

VectorSpaceLab / Video-XL

🔥🔥First-ever hour scale video understanding models

Python 224 15 Updated Dec 22, 2024

jy0205 / Pyramid-Flow

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling

Python 2,718 266 Updated Dec 21, 2024

aigc-apps / EasyAnimate

📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion

Python 1,690 122 Updated Jan 21, 2025

baaivision / Emu3

Next-Token Prediction is All You Need

Python 1,969 78 Updated Oct 24, 2024

OpenGVLab / all-seeing

[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"

Python 473 17 Updated Aug 9, 2024

Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 6,604 579 Updated Jan 11, 2025

Oryx-mllm / Oryx

MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Python 276 15 Updated Dec 25, 2024

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 4,267 262 Updated Jan 21, 2025

4DVLab / IDKB

Official repository for paper "Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving"

25 Updated Dec 13, 2024

iGuoYanjun / C2ANet

Python 5 Updated Jul 31, 2024

NVlabs / EAGLE

Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

Python 537 37 Updated Jan 20, 2025

RenShuhuai-Andy / TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Python 326 30 Updated Nov 19, 2024

yunlong10 / Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1,839 90 Updated Jan 15, 2025

VITA-MLLM / VITA

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 1,986 143 Updated Jan 21, 2025

IVGSZ / Flash-VStream

This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"

Python 152 13 Updated Dec 24, 2024

PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 3,116 222 Updated Dec 3, 2024

magic-research / PLLaVA

Official repository for the paper PLLaVA

Python 634 48 Updated Jul 28, 2024

showlab / videollm-online

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 283 33 Updated Aug 15, 2024

ShareGPT4Omni / ShareGPT4Video

[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Python 1,029 47 Updated Oct 9, 2024

mlfoundations / open_clip

An open source implementation of CLIP.

Python 10,831 1,022 Updated Jan 4, 2025

OpenGVLab / LCL

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

Python 68 3 Updated Jun 16, 2024

opendatalab / HA-DPO

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Python 77 6 Updated Jan 30, 2024

Kevinz-code / SeVa

[MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501

Python 46 3 Updated Jul 26, 2024

GAIR-NLP / anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 711 38 Updated Aug 5, 2024

bit-lsj / HPHS

[IROS 2024] HPHS: Hierarchical Planning based on Hybrid Frontier Sampling for Unknown Environments Exploration

Python 41 3 Updated Oct 11, 2024

facebookresearch / chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,911 113 Updated Jul 29, 2024