This project aims to provide a codebase for the image classification task implemented by PyTorch. It does not use any high-level deep learning libraries (such as pytorch-lightening or MMClassification). Thus, it should be easy to follow and modified.
The code is tested on python==3.9, pyhocon==0.3.57, torch=1.8.0, torchvision=0.9.0
You can get started with a resnet20 convolution network on cifar10 with the following command.
Single node, single GPU:
CUDA_VISIBLE_DEVICES=0 python -m entry.run --conf conf/cifar10.conf -o output/cifar10/resnet20
Tips: run
CUDA_VISIBLE_DEVICES=0 python -m entry.run --conf conf/resnet50-benchmark.conf -o output/benchmark
to check throughput performance, more details can be found at doc/benchmark.md
You can use multiple GPUs to accelerate the training with distributed data parallel:
Single node, multiple GPUs:
CUDA_VISIBLE_DEVICES=0,1 python -m entry.run --world-size 2 \
--conf conf/cifar10.conf -o output/cifar10/resnet20
Multiple nodes:
Node 0:
CUDA_VISIBLE_DEVICES=0,1 python -m entry.run --world-size 4 --dist-url \
'tcp://IP_OF_NODE0:FREEPORT' --node-rank 0 --conf conf/cifar10.conf -o output/cifar10/resnet20
Node 1:
CUDA_VISIBLE_DEVICES=0,1 python -m entry.run --world-size 4 --dist-url \
'tcp://IP_OF_NODE1:FREEPORT' --node-rank 1 --conf conf/cifar10.conf -o output/cifar10/resnet20
This codebase adopt configuration file (.hocon
) to store the hyperparameters (such as the learning rate, training epochs and etc.).
If you want to modify the configuration hyperparameters, you have two ways:
-
Modify the configuration file to generate a new file.
-
You can add
-M
in the running command line to modify the hyperparameters temporarily.
For example, if you hope to modify the total training epochs to 100 and the learning rate to 0.05. You can run the following command:
CUDA_VISIBLE_DEVICES=0 python -m entry.run --conf conf/cifar10.conf -o output/cifar10/resnet20 -M max_epochs=100 optimizer.lr=0.05
If you modify a non existing hyperparameter, the code will raise an exception.
To list all valid hyperparameters names, you can run the following command:
pyhocon -i conf/cifar10.conf -f properties
- We use NVIDIA DALI to accelerate the data preprocessing on ImageNet (use it by the flag
data.use_dali
) and tfrecord format to store the ImageNet (create the tfrecords bytools/make_tfrecord.py
and use it by the flagdata.use_tfrecord
).
Finally, enjoy the code.
@misc{chen2020image,
author = {Yaofo Chen},
title = {Image Classification Codebase},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/chenyaofo/image-classification-codebase}}
}