Study of RAdam based on the paper "On the Variance of the Adaptive Learning Rate and Beyond" by Liu et Al. (2019a)
@article{liu2019radam,
title={On the Variance of the Adaptive Learning Rate and Beyond},
author={Liu, Liyuan and Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Han, Jiawei},
journal={arXiv preprint arXiv:1908.03265},
year={2019}
}
I also used the code from the paper "On the adequacy of untuned warmup for adaptive optimization", arXiv:1910.04209, 2019, Jerry Ma and Denis Yarats, in which Adam with warmup was implemented.
In this project, I compare the performance of RAdam with Adam and Adam with warmup using the Fashion MNIST dataset. More precisely, the code evaluates the robustness of these optimization algorithms to variations of the learning rate. My report delves deeper in the theory and the implementation.
Here is the list of libraries on Python 3.6 needed: torch 1.3.1, torchvision 0.4.2, pytorch-warmup 0.0.4, tensorflow 2.0.0, tensorboard 2.0.2
Below are the performance obtained after training these 3 optimizers with different values for learning rate :
Radam Train Loss | Radam Test Accuracy |
---|---|
Adam Train Loss | Adam Test Accuracy |
---|---|
Adam with warmup Train Loss | Adam with warmup Test Accuracy |
---|---|