Stars
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
[NAACL 2025 Oral] 🎉 From redundancy to relevance: Enhancing explainability in multimodal large language models
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Improving Mamaba performance on Video Understanding task
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Awesome LLM compression research papers and tools.
MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
When do we not need larger vision models?
[ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks.
Official implementation of the ICASSP-2022 paper "Text2Poster: Laying Out Stylized Texts on Retrieved Images"
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
The project page of paper: CPM-Nets: Cross Partial Multi-View Networks [NeurIPS 2019 Spotlight paper]