Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark
The evaluation of object detection models is usually performed by optimizing a single metric, e.g. mAP, on a fixed set of datasets, e.g. Microsoft COCO and Pascal VOC. Due to image retrieval and annotation costs, these datasets consist largely of images found on the web and do not represent many real-life domains that are being modelled in practice, e.g. satellite, microscopic and gaming, making it difficult to assert the degree of generalization learned by the model. We introduce the Roboflow-100 (RF100) consisting of 100 datasets, 7 imagery domains, 224,714 images, and 805 class labels with over 11,170 labelling hours. We derived RF100 from over 90,000 public datasets, 60 million public images that are actively being assembled and labelled by computer vision practitioners in the open on the web application Roboflow Universe. By releasing RF100, we aim to provide a semantically diverse, multi-domain benchmark of datasets to help researchers test their model's generalizability with real-life data. RF100 download and benchmark replication are available on GitHub.
# current path is projects/RF100-Benchmark/
├── configs
│ ├── dino_r50_fpn_ms_8xb8_tweeter-profile.py
│ ├── faster-rcnn_r50_fpn_ms_8xb8_tweeter-profile.py
│ └── tood_r50_fpn_ms_8xb8_tweeter-profile.py
├── README.md
├── README_zh-CN.md
├── rf100
└── scripts
├── create_new_config.py # Based on the provided configuration, generate the training configuration of the remaining 99 datasets
├── datasets_links_640.txt # Dataset download link, from the official repo
├── download_dataset.py # Dataset download code, from the official repo
├── download_datasets.sh # Dataset download script, from the official repo
├── labels_names.json # Dataset information, from the official repo, but there are some errors so we modified it
├── parse_dataset_link.py # from the official repo
├── log_extract.py # Results collection and collation of training
└── dist_train.sh # Training and evaluation startup script
└── slurm_train.sh # Slurm Training and evaluation startup script
Roboflow 100 dataset is hosted by Roboflow platform, and detailed download scripts are provided in the roboflow-100-benchmark repository. For simplicity, we use the official download script directly.
Before downloading the data, you need to register an account on the Roboflow platform to get the API key.
export ROBOFLOW_API_KEY = Your Private API Key
At the same time, you should also install the Roboflow package.
pip install roboflow
Finally, use the following command to download the dataset.
cd projects/RF100-Benchmark/
bash scripts/download_datasets.sh
Download the dataset, and a rf100
folder will be generated in the current directory projects/RF100-Benchmark/
, which contains all the datasets. The structure is as follows:
# current path is projects/RF100-Benchmark/
├── README.md
├── README_zh-CN.md
└── scripts
├── datasets_links_640.txt
├── rf100
│ └── tweeter-profile
│ │ ├── train
| | | ├── 0b3la49zec231_jpg.rf.8913f1b7db315c31d09b1d2f583fb521.jpg
| | | ├──_annotations.coco.json
│ │ ├── valid
| | | ├── 0fcjw3hbfdy41_jpg.rf.d61585a742f6e9d1a46645389b0073ff.jpg
| | | ├──_annotations.coco.json
│ │ ├── test
| | | ├── 0dh0to01eum41_jpg.rf.dcca24808bb396cdc07eda27a2cea2d4.jpg
| | | ├──_annotations.coco.json
│ │ ├── README.dataset.txt
│ │ ├── README.roboflow.txt
│ └── 4-fold-defect
...
The dataset takes up a total of 12.3G of storage space. If you don't want to train and evaluate all models at once, you can modify the scripts/datasets_links_640.txt
file and delete the links to the datasets you don't want to use.
Roboflow 100 dataset features are shown in the following figure
If you want to have a clear understanding of the dataset, you can check the roboflow-100-benchmark repository, which provides many dataset analysis scripts.
If you want to train and evaluate all models at once, you can use the following command.
- Single GPU Training
# current path is projects/RF100-Benchmark/
bash scripts/dist_train.sh configs/faster-rcnn_r50_fpn_ms_8xb8_tweeter-profile.py 1
# Specify the save path
bash scripts/dist_train.sh configs/faster-rcnn_r50_fpn_ms_8xb8_tweeter-profile.py 1 my_work_dirs
- Distributed Multi-GPU Training
bash scripts/dist_train.sh configs/faster-rcnn_r50_fpn_ms_8xb8_tweeter-profile.py 8
# Specify the save path
bash scripts/dist_train.sh configs/faster-rcnn_r50_fpn_ms_8xb8_tweeter-profile.py 8 my_work_dirs
- Slurm Training
bash scripts/slurm_train.sh configs/faster-rcnn_r50_fpn_ms_8xb8_tweeter-profile.py 8
# Specify the save path
bash scripts/slurm_train.sh configs/faster-rcnn_r50_fpn_ms_8xb8_tweeter-profile.py 8 my_work_dirs
After training, a work_dirs
folder will be generated in the current directory, which contains the trained model weights and logs.
- For the convenience of users to debug or only want to train specific datasets, we provide the
DEBUG
variable inscripts/*_train.sh
, you only need to set it to 1, and specify the datasets you want to train in thedatasets_list
variable. - Considering that for various reasons, users may encounter training failures for certain datasets during the training process, we provide the
RETRY_PATH
variable, you only need to pass in the txt dataset list file, and the program will read the dataset in the file, and then only train specific datasets. If not provided, it is training the full dataset.
RETRY_PATH=failed_dataset_list.txt bash scripts/dist_train.sh configs/faster-rcnn_r50_fpn_ms_8xb8_tweeter-profile.py 8 my_work_dirs
The txt represents a dataset name on each line, as shown below (the blank line in the 4th line is indispensable):
acl-x-ray
tweeter-profile
abdomen-mri
The txt file can also be generated using the log_extract.py
script introduced later, without manually creating it.
If you want to collect the results after the model is trained or during the training, you can execute the log_extract.py
script, which will collect the information under work_dirs
and output it in csv and xlsx format.
Before running the script, please make sure that pandas
and openpyxl
are installed
python scripts/log_extract.py faster_rcnn --epoch 25 --work-dirs my_work_dirs
- The first input parameter is used to generate the csv title, so you can enter any string, but it is recommended to enter the model name for easy viewing later.
--epoch
parameter refers to the number of model training epochs, which is used to parse the log. By default, we train 100 epochs for each dataset, but RepeatDataset is used in the configuration, so the actual training epoch is 25.--work-dirs
is the working path where you save the trained model. The default is thework_dirs
folder under the current path.
After running, the following three new files will be generated in my_work_dirs
timestamp_detail.xlsx # Detailed information on the sorting of 100 datasets.
timestamp_sum.xlsx # Summary information of 100 datasets.
timestamp_eval.csv # Evaluation results of 100 datasets in the order of training.
failed_dataset_list.txt
Currently, we provide the evaluation results of the Faster RCNN, TOOD and DINO algorithms (no careful parameter tuning). You can also quickly evaluate your own model according to the above process.
💎 The detailed table can be accessed directly here 💎
To ensure a fair comparison and no special parameter tuning, the Faster RCNN, TOOD and DINO
algorithms use the same epoch and data augmentation strategy, and all load the COCO pre-training weights, and save the best model performance on the validation set during training. Other instructions are as follows:
- To speed up the training speed, all models are trained on 8-card GPUs. Except that the DINO algorithm trains OOM on some datasets, all other models and datasets are trained on 8 3090s
- Because the GT boxes of the single image of the 5 datasets 'bacteria-ptywi', 'circuit-elements', 'marbles', 'printed-circuit-board', 'solar-panels-taxvb' are very large, which makes DINO unable to train on 3090, so we train these 5 datasets on A100
From the above figure, the performance of the DINO
algorithm is better than that of traditional CNN detection algorithms such as Faster RCNN and TOOD
, which shows that the Transformer algorithm is also better than traditional CNN detection algorithms in different fields or different data volumes. However, if a certain field is analyzed separately, it may not be the case.
Roboflow 100 datasets also have defects:
- Some datasets have very few training images. If you want to benchmark with the same hyperparameters, it may cause poor performance
- Some datasets in some fields have very small and many objects.
Faster RCNN, TOOD and DINO
have very poor results without specific parameter tuning. For this situation, users can ignore the results of these datasets - Some datasets have too casual annotations, which may result in poor performance if you want to apply them to image-text detection models
Finally, it needs to be explained:
- Since there are a lot of 100 datasets, we cannot check each dataset, so if there is anything unreasonable, please feedback, we will fix it as soon as possible.
- We also provide various scale summary results such as mAP_s, but because some data does not exist this scale bounding box, we ignore these datasets when summarizing.
If users want to benchmark different algorithms for Roboflow 100, you only need to add algorithm configurations in the projects/RF100-Benchmark/configs
folder.
Note: Since the internal running process is to replace the string in the user-provided configuration with the function of custom dataset, the configuration provided by the user must be the tweeter-profile
dataset and must include the data_root
and class_name
variables, otherwise the program will report an error.
@misc{2211.13523,
Author = {Floriana Ciaglia and Francesco Saverio Zuppichini and Paul Guerrie and Mark McQuade and Jacob Solawetz},
Title = {Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark},
Year = {2022},
Eprint = {arXiv:2211.13523},
}