Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Cost due to the EMA mechanism #50

Open
Lancelot39 opened this issue Nov 25, 2022 · 0 comments
Open

Training Cost due to the EMA mechanism #50

Lancelot39 opened this issue Nov 25, 2022 · 0 comments

Comments

@Lancelot39
Copy link

Thanks for such nice work and your kind released code! I have just tried it and found that the EMA mechanism has been used in your optimization of the Diffusion-LM code, which limits the update of the model parameters a lot. Indeed, such a way may stabilize the training process but also increase the training cost. I suppose once it has been removed, would the performance of Diffusion-LM degrade a lot? Or maybe the training cost could be further reduced a lot?

I am very expected to know the motivation of using EMA in your approach. Looking forward to your response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant