Skip to content
View wanghao15536870732's full-sized avatar
📚
Learning in school
📚
Learning in school

Highlights

  • Pro

Organizations

@android-nuc @17070047

Block or report wanghao15536870732

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
11 Updated Nov 24, 2024

[NeurIPS 2024] WATT: Weight Average Test-Time Adaptation of CLIP

Python 46 2 Updated Sep 26, 2024

[ICLRW 2024] Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment

Python 44 1 Updated Jul 18, 2024

EUFCC-CIR: A Composed Image Retrieval Dataset for GLAM Collections

2 Updated Oct 3, 2024

[AAAI-2025] The official code of Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation

Python 24 Updated Feb 17, 2025

When do we not need larger vision models?

Python 371 12 Updated Feb 8, 2025

Provide with pre-build flash-attention package wheels using GitHub Actions

25 1 Updated Feb 22, 2025

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 17,588 1,764 Updated Feb 26, 2025

[ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"

Python 59 5 Updated Jul 4, 2024

[ICCV 2023] - Composed Image Retrieval on Common Objects in context (CIRCO) dataset

Python 63 2 Updated Aug 14, 2024
Python 5 Updated Dec 16, 2024

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 38,001 4,640 Updated Mar 1, 2025

Rethinking Step-by-step Visual Reasoning in LLMs

Python 259 16 Updated Jan 24, 2025

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Python 6,037 519 Updated Sep 6, 2024

Visual Delta Generator with Large Multi-modal Model for Semi-supervised Composed Image Retrieval - CVPR2024

Python 18 1 Updated May 30, 2024

The official implementation of Natural Language Fine-Tuning

Python 44 3 Updated Jan 7, 2025

Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"

Python 554 46 Updated Apr 24, 2022

Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".

Python 108 2 Updated Dec 13, 2024

ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model

Python 16 3 Updated Jan 31, 2024

[ECCV2024] The Official Implementation for ''AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection''

Python 196 9 Updated Dec 26, 2024

A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.

438 18 Updated Feb 13, 2025

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 6,774 444 Updated Jan 12, 2025

A Clip-Hitchiker's Guide to Long Video Retrieval [Arxiv 2022]

Python 10 1 Updated Jan 30, 2024

The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"

Python 70 5 Updated Jan 14, 2025

Collection of Composed Image Retrieval (CIR) papers.

143 8 Updated Feb 27, 2025

CLIPScore EMNLP code

Python 211 27 Updated Dec 16, 2022

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge (ICCV 2023)

Python 29 4 Updated Sep 5, 2023

[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval

Python 35 Updated Aug 27, 2024

Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original code and model can be accessed at FlagEmbedding.

Python 33 2 Updated Nov 16, 2024

Code for the paper "Finetuning CLIP to Reason about Pairwise Differences"

Python 13 1 Updated Oct 1, 2024
Next