Training Cost due to the EMA mechanism #50

Lancelot39 · 2022-11-25T05:52:49Z

Thanks for such nice work and your kind released code! I have just tried it and found that the EMA mechanism has been used in your optimization of the Diffusion-LM code, which limits the update of the model parameters a lot. Indeed, such a way may stabilize the training process but also increase the training cost. I suppose once it has been removed, would the performance of Diffusion-LM degrade a lot? Or maybe the training cost could be further reduced a lot?

I am very expected to know the motivation of using EMA in your approach. Looking forward to your response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Cost due to the EMA mechanism #50

Training Cost due to the EMA mechanism #50

Lancelot39 commented Nov 25, 2022

Training Cost due to the EMA mechanism #50

Training Cost due to the EMA mechanism #50

Comments

Lancelot39 commented Nov 25, 2022