-
Zhejiang University
- Hangzhou, China
- https://espere-1119-song.github.io/
Stars
A paper list of some recent works about Token Compress for Vit and VLM
Codebase for Aria - an Open Multimodal Native MoE
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".
A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
Open Overleaf/ShareLaTex projects in vscode, with full collaboration support.
😎 A curated list of awesome GitHub Profile which updates in real time