-
Xidian University
- Xi'an
- https://wanghao15536870732.github.io
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
[NeurIPS 2024] WATT: Weight Average Test-Time Adaptation of CLIP
[ICLRW 2024] Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment
EUFCC-CIR: A Composed Image Retrieval Dataset for GLAM Collections
[AAAI-2025] The official code of Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
When do we not need larger vision models?
Provide with pre-build flash-attention package wheels using GitHub Actions
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
[ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"
[ICCV 2023] - Composed Image Retrieval on Common Objects in context (CIRCO) dataset
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Rethinking Step-by-step Visual Reasoning in LLMs
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Visual Delta Generator with Large Multi-modal Model for Semi-supervised Composed Image Retrieval - CVPR2024
The official implementation of Natural Language Fine-Tuning
Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"
Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".
ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model
[ECCV2024] The Official Implementation for ''AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection''
A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
A Clip-Hitchiker's Guide to Long Video Retrieval [Arxiv 2022]
The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"
Collection of Composed Image Retrieval (CIR) papers.
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge (ICCV 2023)
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original code and model can be accessed at FlagEmbedding.
Code for the paper "Finetuning CLIP to Reason about Pairwise Differences"