- Sydney
-
19:29
(UTC +11:00)
Lists (1)
Sort Name ascending (A-Z)
Stars
Survey on LLM Agents (Published on CoLing 2025)
A paper list for Robotics / Embodied AI - Tianxing Chen
DeepTimber Robotics Talent Call | DeepTimber社区具身智能招贤榜 | A list for Embodied AI / Robotics Jobs (PhD, RA, intern, full-time, etc
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
🎥 Python and OpenCV-based scene cut/transition detection program & library.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
I have always had a weird knack of writing notes for each and every topic I did. This is a repository dedicated to those notes. Please feel free to use them and pass it on to those who you think mi…
[CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
[MM'24 Oral] Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
Codes for Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Awesome resources for in-context learning and prompt engineering: Mastery of the LLMs such as ChatGPT, GPT-3, and FlanT5, with up-to-date and cutting-edge updates.
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Official implementation of paper "One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications".
[AAAI2025] Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient
A One-key fast evaluation on saliency object detection with GPU implementation including MAE, Max F-measure, S-measure, E-measure.
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
📚 Collection of awesome generation acceleration resources.
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding