A pytorch implementation of dorefa.The code is inspired by LaVieEnRoseSMZ and zzzxxxttt.
- python > 3.5
- torch >= 1.1.0
- torchvision >= 0.4.0
- tb-nightly, future (for tensorboard)
- nvidia-dali >= 0.12 (faster dataloader)
Quantized model are trained from scratch
Model | W_bit | A_bit | Acc |
---|---|---|---|
resnet-18 | 32 | 32 | 94.71% |
resnet-18 | 4 | 4 | 94.36% |
resnet-18 | 1 | 4 | 93.87% |
Quantized model are trained from scratch
Model | W_bit | A_bit | Top1 | Top5 |
---|---|---|---|---|
resnet-18 | 32 | 32 | 69.80% | 89.32% |
resnet-18 | 4 | 4 | 66.60% | 87.15% |
Download the ImageNet dataset and move validation images to labeled subfolders.To do this, you can use the following script
- To train the model
python3 cifar_train_eval.py
python3 imagenet_torch_loader --multiprocessing-distributed or python3 imagenet_dali_loader.py
-
To check the tensorboard log
tensorboard --logdir='your_log_dir'
then navigating to https://localhost:6006 .
-
To test the quantized model and bn fused
- convert to the quantized model for inference
python3 test_fused_quant_model.py
- test bn fuse on the float model
python3 bn_fuse.py
Obviously, this fusion method is not suitable for quantized models. We will change the bn fuse in the future according to the paper section 3.2.2.
This bn fuse test result is not serious. However, it is OK to explain the problem qualitatively.
Model on CPU | before fuse | after fuse |
---|---|---|
resnet-18 | 0.74 s | 0.51 s |
resnet-34 | 1.41 s | 0.92 s |
resnet-50 | 1.96 s | 1.02 s |
- Train on imagenet2012
- Fold bn
- Test speedup from quantization and bn fold
- Deploy models to embedded devices
- ...