Skip to content
View LXDxmu's full-sized avatar
  • Xiamen University

Block or report LXDxmu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension"

Python 61 Updated Mar 15, 2025

[ICLR 2025] CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs

Python 8 Updated Feb 20, 2025

Official implementation for "Seagull: No-reference Image Quality Assessment for Regions of Interest via Visual-Language Instruction Tuning"

Python 39 5 Updated Mar 7, 2025

Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"

49 Updated Mar 11, 2025

Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.

Python 54 1 Updated Dec 14, 2024
Python 11 Updated Jan 19, 2025

[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,256 56 Updated Mar 12, 2025

Official implement of MIA-DPO

Python 54 2 Updated Jan 23, 2025

Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models'.

Python 44 2 Updated Feb 25, 2025

【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

114 4 Updated Oct 18, 2024

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

Python 76 6 Updated Oct 10, 2024

[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

Python 66 2 Updated Jan 27, 2025

MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU

Python 46 Updated Sep 29, 2023

Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.

Python 271 25 Updated Oct 13, 2023
Python 346 35 Updated May 25, 2024

[ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Python 296 16 Updated Feb 27, 2025

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"

Python 572 39 Updated Jan 7, 2024

PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models"

Python 20 1 Updated Feb 8, 2025

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Python 199 14 Updated Jan 6, 2025
HTML 80 8 Updated May 10, 2024

A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".

Jupyter Notebook 980 69 Updated Mar 25, 2023