Name		Name	Last commit message	Last commit date
parent directory ..
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pact.py		pact.py
utils.py		utils.py

README.md

Example of Quantization Aware Training (QAT) with Ignite on CIFAR10

Model's implementation is based on https://discuss.pytorch.org/t/evaluator-returns-nan/107972/3

In this example, we show how to use Ignite to train a neural network:

on 1 or more GPUs
compute training/validation metrics
log learning rate, metrics etc
save the best model weights

Configurations:

single GPU
multi GPUs on a single node

Requirements:

pytorch-ignite: pip install pytorch-ignite
torchvision: pip install torchvision
tqdm: pip install tqdm
tensorboardx: pip install tensorboardX
python-fire: pip install fire
brevitas: pip install git+https://github.com/Xilinx/brevitas.git

Usage:

We can train, for example, ResNet-18 with 8 bit weights and activations.

Run the example on a single GPU:

CUDA_VISIBLE_DEVICES=0 python main.py run --model="resnet18_QAT_8b"

Note: torch DataParallel is not working (v1.7.1) with QAT.

For details on accepted arguments:

python main.py run -- --help

If user would like to provide already downloaded dataset, the path can be setup in parameters as

--data_path="/path/to/cifar10/"

Other available models can be found here:

resnet18_QAT_8b - ResNet-18 with 8 bit weights and activations
resnet18_QAT_6b - ResNet-18 with 6 bit weights and activations
resnet18_QAT_5b - ResNet-18 with 5 bit weights and activations
resnet18_QAT_4b - ResNet-18 with 4 bit weights and activations
torchvision models

Distributed training

Single node, multiple GPUs

Let's start training on a single node with 2 gpus:

# using torch.distributed.launch
python -u -m torch.distributed.launch --nproc_per_node=2 --use_env main.py run --backend="nccl" --model="resnet18_QAT_8b"

Using Horovod as distributed backend

Please, make sure to have Horovod installed before running.

Let's start training on a single node with 2 gpus:

# horovodrun
horovodrun -np=2 python -u main.py run --backend="horovod" --model="resnet18_QAT_8b"

or

# using function spawn inside the code
python -u main.py run --backend="horovod" --nproc_per_node=2 --model="resnet18_QAT_8b"

Online logs

On TensorBoard.dev: https://tensorboard.dev/experiment/Kp9Wod3XR36Sg2I1gAh1cA/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cifar10_qat

cifar10_qat

README.md

Example of Quantization Aware Training (QAT) with Ignite on CIFAR10

Requirements:

Usage:

Distributed training

Single node, multiple GPUs

Using Horovod as distributed backend

Online logs

Files

cifar10_qat

Directory actions

More options

Directory actions

More options

Latest commit

History

cifar10_qat

Folders and files

parent directory

README.md

Example of Quantization Aware Training (QAT) with Ignite on CIFAR10

Requirements:

Usage:

Distributed training

Single node, multiple GPUs

Using Horovod as distributed backend

Online logs