Jialong Zuo 1,
Ying Nie 2,
Hanyu Zhou 1,
Huaxin Zhang 1,
Haoyu Wang 2,
Tianyu Guo 2,
Nong Sang 1,
Changxin Gao 1,*
1 Huazhong University of Science and Technology,
2 Huawei Noah’s Ark Lab.
* Corresponding Author.
-
ReIDZoo is a new fully open-sourced pre-trained model zoo to meet diverse research and application needs in the field of person re-identification. It contains a series of CION pre-trained models with spanning structures and parameters, totaling 32 models with 10 different structures, including GhostNet, ConvNext, RepViT, FastViT and so on.
-
CION is our proposed person re-identification pre-training framework that deeply utilizes cross-video identity correlations. It simply consists of a progressive identity correlation seeking strategy and an identity-guided self-distillation pre-training technology.
-
CION-AL is a new large-scale person re-identification pre-training dataset with almost accurate identity labels. The images are obtained from LUPerson-NL based on our proposed progressive identity correlation seeking strategy. It contains 3,898,086 images of 246,904 identities totally.
Our pre-trained models enable existing person ReID algorithms to achieve significantly better performance without bells and whistles. In this project, we will open-source the code, models and dataset. More details can be found at our paper Cross-video Identity Correlating for Person Re-identification Pre-training.
Note: To address the privacy concerns associated with crawling videos from the internet and the application of person ReID technology, we will implement a controlled release of our models and dataset, thereby preventing privacy violations and ensuring information security.
- 🙂[2024.11.26] We release the downstream fine-tuning code.
- 🙂[2024.9.30] Our paper is released on Arxiv.
- 🙂[2024.9.26] Good News! Our paper is accepted by NeurIPS2024.
- Release the paper.
- Release the 32 models of ReIDZoo.
- Release the CION-AL dataset.
- Release the pre-training code of CION.
- Release the downstream fine-tuning code.
ReIDZoo contains 32 CION pre-trained models with 10 different structures. Among them, the GhostNet, EdgeNext, RepViT and FastViT are representative models with lightweight designs, which have smaller computational overhead and are convenient for practical deployment. Meanwhile, the ResNet, ResNet-IBN, ConvNext, VOLO, Vision Transformer and Swin Transformer are conventional models, which usually have more parameters and enjoy better performance.
The supervised fine-tuning performance of each model is shown in the table below. The up-arrow value represents the performance improvement compared to each corresponding ImageNet pre-trained model. As a common practice, we utilized MGN and TransReID as the fine-tuning algorithms. Please refer to our paper for more experimental details and results.
To obtain the ReIDZoo, please download and sign the license agreement (ReIDZoo_License.pdf) and send it to [email protected] or [email protected] . Once the procedure is approved, the download link will be sent to your email.
CION-AL is a large-scale person re-identification pre-training dataset with almost accurate identity labels. In constructing the CION-AL dataset, we utilize our proposed progressive identity correlation seeking strategy to tag the images in LUPerson-NL with more accurate identity labels. It contains 3,898,086 images of 246,904 identities totally.
To obtain the CION-AL dataset, please download and sign the license agreement (CION_AL_License.pdf) and send it to [email protected] or [email protected] . Once the procedure is approved, the download link will be sent to your email.
CION is our proposed Cross-video Identity-cOrrelating pre-traiNing framework for person re-identification. By utilizing our CION to pre-train the models, it is easier to learn identity-invariant representations for person re-identification. Now, we introduce the guidelines of how to pre-train the models from scratch.
Download the CION-AL dataset and organize it in dataset
folder as follows:
|-- CION_Pretrain/
| |-- dataset/
| | |-- CION_AL/
| | |-- sec0
| | |-- sec1
| | |-- ...
| | |-- sec129
| | |-- cional_annos.json
| |-- data/
| |-- models/
| |-- loss.py
| |-- run.py
| |-- utils.py
As a default setting, we use 8×V100 (32GB) GPUs for pre-training the models. We set different batch sizes and numbers of local cropped views for different model to achieve better computational resouce utilization. Specific settings for each model can be found in Table 5 of our paper.
Taking ResNet50-IBN as an example, run the following command to implement the pre-training process. On 8×V100s, the entire pre-training process will take approximately 5 days.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 run.py --arch resnet50_ibn --batch_size_per_gpu 120 --local_crops_number 8 --output_dir logs/resnet50_ibn
We utilize TransReID, MGN and C-Contrast to implement the downstream fine-tuning tasks. Please find more details in the CION_Finetune folder.
We thank these great works and open-source repositories: DINO, LUPerson-NL, TransReID, FastReID, cluster-contrast-reid, TransReID-SSL, GhostNet, ResNet, ResNet-IBN, EdgeNeXt, RepViT, FastViT, ConvNeXt, Vision Transformer, Swin Transformer and VOLO.
@inproceedings{
zuo2024crossvideo,
title={Cross-video Identity Correlating for Person Re-identification Pre-training},
author={Jialong Zuo and Ying Nie and Hanyu Zhou and Huaxin Zhang and Haoyu Wang and Tianyu Guo and Nong Sang and Changxin Gao},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024}
}