This is a re-implementation of Online Label Smoothing. The code is written based on my understanding of the paper. If there's any bug in my code, please tell me in the Issues page.
from OLS import OnlineLabelSmoothing
ols_loss = OnlineLabelSmoothing(num_classes=1000, use_gpu=True)
# Training
for epoch in range(total_epoch):
# train()
# test()
# Saving{'ols': ols_loss.matrix.cpu().data}, 'ols.pth')
- Python 3.7
- PyTorch 1.6.0
- GPU: Tesla V100 32GB * 1
num_classes: 1000
optimizer: SGD
init_lr: 0.1
weight_decay: 0.0001
momentum: 0.9
lr_gamma: 0.1
total_epoch: 250
batch_size: 256
num_workers: 20
random_seed: 2020
amp: True # automatic mixed-precision training, this function is offered by pytorch
- use single gpu
python --amp -s cos --loss ce ols --loss_w 0.5 0.5
- use multi gpus single node
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch\
--nproc_per_node=2 --master_addr --master_port 23456\ --multi-gpus 1 -nw 20 --amp -s multi --loss ce ols --loss_w 0.5 0.5
- use multi gpus multi nodes
# Limited computing resources
Although I used AMP(automatic mixed-precision) to speed up my training, it still took me nearly five days, so I didn't do any other experiments with ols. But there are other records of training ImageNet in my blog.
Model | Loss | epoches | lr_schedule | Acc@1 | Acc@5 |
ResNet50 | CE | 250 | Multi Step [75,150,225] | 76.32 | 93.06 |
ResNet50 | CE | 250 | COS with 5 epochs warmup | 76.95 | 93.27 |
ResNet50 | 0.5*CE+0.5*OLS | 250 | Multi Step [75,150,225] | 77.27 | 93.47 |
ResNet50 | 0.5*CE+0.5*OLS | 250 | COS with 5 epochs warmup | 77.79 | 93.79 |
ResNet50 | LS(e=0.1) | 250 | COS with 5 epochs warmup | 77.62 | 93.75 |
ResNet50 | LS(e=0.2) | 250 | COS with 5 epochs warmup | 77.89 | 93.74 |