You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for such nice work and your kind released code! I have just tried it and found that the EMA mechanism has been used in your optimization of the Diffusion-LM code, which limits the update of the model parameters a lot. Indeed, such a way may stabilize the training process but also increase the training cost. I suppose once it has been removed, would the performance of Diffusion-LM degrade a lot? Or maybe the training cost could be further reduced a lot?
I am very expected to know the motivation of using EMA in your approach. Looking forward to your response.
The text was updated successfully, but these errors were encountered:
Thanks for such nice work and your kind released code! I have just tried it and found that the EMA mechanism has been used in your optimization of the Diffusion-LM code, which limits the update of the model parameters a lot. Indeed, such a way may stabilize the training process but also increase the training cost. I suppose once it has been removed, would the performance of Diffusion-LM degrade a lot? Or maybe the training cost could be further reduced a lot?
I am very expected to know the motivation of using EMA in your approach. Looking forward to your response.
The text was updated successfully, but these errors were encountered: