Skip to content
View exiawsh's full-sized avatar

Block or report exiawsh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

"Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models", Hanwen Liang*, Yuyang Yin*, Dejia Xu, Hanxue Liang, Zhangyang Wang, Konstantinos N. Plataniotis, Yao Zhao, …

Python 261 5 Updated Jan 21, 2025

Memory-optimized training scripts for video models based on Diffusers

Python 760 79 Updated Jan 21, 2025

Official implementation of the paper "Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content".

Python 133 4 Updated Nov 8, 2024

The best OSS video generation models

Python 2,747 282 Updated Jan 8, 2025

🔥🔥First-ever hour scale video understanding models

Python 224 15 Updated Dec 22, 2024

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling

Python 2,718 266 Updated Dec 21, 2024

📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion

Python 1,690 122 Updated Jan 21, 2025

Next-Token Prediction is All You Need

Python 1,969 78 Updated Oct 24, 2024

[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"

Python 473 17 Updated Aug 9, 2024

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 6,604 579 Updated Jan 11, 2025

MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Python 276 15 Updated Dec 25, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 4,267 262 Updated Jan 21, 2025

Official repository for paper "Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving"

25 Updated Dec 13, 2024
Python 5 Updated Jul 31, 2024

Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

Python 537 37 Updated Jan 20, 2025

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Python 326 30 Updated Nov 19, 2024

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1,839 90 Updated Jan 15, 2025

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 1,986 143 Updated Jan 21, 2025

This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"

Python 152 13 Updated Dec 24, 2024

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 3,116 222 Updated Dec 3, 2024

Official repository for the paper PLLaVA

Python 634 48 Updated Jul 28, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 283 33 Updated Aug 15, 2024

[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Python 1,029 47 Updated Oct 9, 2024

An open source implementation of CLIP.

Python 10,831 1,022 Updated Jan 4, 2025

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

Python 68 3 Updated Jun 16, 2024

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Python 77 6 Updated Jan 30, 2024

[MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501

Python 46 3 Updated Jul 26, 2024

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 711 38 Updated Aug 5, 2024

[IROS 2024] HPHS: Hierarchical Planning based on Hybrid Frontier Sampling for Unknown Environments Exploration

Python 41 3 Updated Oct 11, 2024

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,911 113 Updated Jul 29, 2024
Next