[Paper] [Slide] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍]
This repository contains the dataset link and the code for our paper University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization, ACM Multimedia 2020. The offical paper link is at https://dl.acm.org/doi/10.1145/3394171.3413896. We collect 1652 buildings of 72 universities around the world. Thank you for your kindly attention.
Task 1: Drone-view target localization. (Drone -> Satellite) Given one drone-view image or video, the task aims to find the most similar satellite-view image to localize the target building in the satellite view.
Task 2: Drone navigation. (Satellite -> Drone) Given one satellite-view image, the drone intends to find the most relevant place (drone-view images) that it has passed by. According to its flight history, the drone could be navigated back to the target place.
The dataset split is as follows:
Split | #imgs | #buildings | #universities |
---|---|---|---|
Training | 50,218 | 701 | 33 |
Query_drone | 37,855 | 701 | 39 |
Query_satellite | 701 | 701 | 39 |
Query_ground | 2,579 | 701 | 39 |
Gallery_drone | 51,355 | 951 | 39 |
Gallery_satellite | 951 | 951 | 39 |
Gallery_ground | 2,921 | 793 | 39 |
More detailed file structure:
├── University-1652/
│ ├── readme.txt
│ ├── train/
│ ├── drone/ /* drone-view training images
│ ├── 0001
| ├── 0002
| ...
│ ├── street/ /* street-view training images
│ ├── satellite/ /* satellite-view training images
│ ├── google/ /* noisy street-view training images (collected from Google Image)
│ ├── test/
│ ├── query_drone/
│ ├── gallery_drone/
│ ├── query_street/
│ ├── gallery_street/
│ ├── query_satellite/
│ ├── gallery_satellite/
│ ├── 4K_drone/
We note that there are no overlaps between 33 univeristies of training set and 39 univeristies of test set.
3 March 2021 GeM Pooling is added. You may use it by --pool gem
.
21 January 2021 The GPU-Re-Ranking, a GNN-based real-time post-processing code, is at Here.
21 August 2020 The transfer learning code for Oxford and Paris is at Here.
27 July 2020 The meta data of 1652 buildings, such as latitude and longitude, are now available at Google Driver. (You could use Google Earth Pro to open the kml file or use vim to check the value).
We also provide the spiral flight tour file at Google Driver. (You could open the kml file via Google Earth Pro to enable the flight camera).
26 July 2020 The paper is accepted by ACM Multimedia 2020.
12 July 2020 I made the baseline of triplet loss (with soft margin) on University-1652 public available at Here.
12 March 2020 I add the state-of-the-art page for geo-localization and tutorial, which will be updated soon.
Now we have supported:
- Float16 to save GPU memory based on apex
- Multiple Query Evaluation
- Re-Ranking
- Random Erasing
- ResNet/VGG-16
- Visualize Training Curves
- Visualize Ranking Result
- Linear Warm-up
- Python 3.6
- GPU Memory >= 8G
- Numpy > 1.12.1
- Pytorch 0.3+ (The latest pytorch 1.9, 1.8 may not work due to the change of torchvision.)
- [Optional] apex (for float16)
- Install Pytorch from http://pytorch.org/
- Install Torchvision from the source
git clone https://github.com/pytorch/vision
cd vision
python setup.py install
- [Optinal] You may skip it. Install apex from the source
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext
Download [University-1652] upon request. You may use the request template.
For CVUSA, I follow the training/test split in (https://github.com/Liumouliu/OriCNN).
python train.py --name three_view_long_share_d0.75_256_s1_google --extra --views 3 --droprate 0.75 --share --stride 1 --h 256 --w 256 --fp16;
python test.py --name three_view_long_share_d0.75_256_s1_google
Default setting: Drone -> Satellite If you want to try other evaluation setting, you may change these lines at: https://github.com/layumi/University1652-Baseline/blob/master/test.py#L217-L225
python train_no_street.py --name two_view_long_no_street_share_d0.75_256_s1 --share --views 3 --droprate 0.75 --stride 1 --h 256 --w 256 --fp16;
python test.py --name two_view_long_no_street_share_d0.75_256_s1
Set three views but set the weight of loss on street images to zero.
python prepare_cvusa.py
python train_cvusa.py --name usa_vgg_noshare_warm5_lr2 --warm 5 --lr 0.02 --use_vgg16 --h 256 --w 256 --fp16 --batchsize 16;
python test_cvusa.py --name usa_vgg_noshare_warm5_lr2
You could download the trained model at GoogleDrive or OneDrive. After download, please put model folders under ./model/
.
The following paper uses and reports the result of the baseline model. You may cite it in your paper.
@article{zheng2020university,
title={University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization},
author={Zheng, Zhedong and Wei, Yunchao and Yang, Yi},
journal={ACM Multimedia},
year={2020}
}
Instance loss is defined in
@article{zheng2017dual,
title={Dual-Path Convolutional Image-Text Embeddings with Instance Loss},
author={Zheng, Zhedong and Zheng, Liang and Garrett, Michael and Yang, Yi and Xu, Mingliang and Shen, Yi-Dong},
journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
doi={10.1145/3383184},
volume={16},
number={2},
pages={1--23},
year={2020},
publisher={ACM New York, NY, USA}
}