-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDE GAN example #80
SDE GAN example #80
Conversation
Thanks for the update. Is the changelog necessary, as I've been documenting things in the release notes? I'll take a look at the other stuff tomorrow. |
Haha, I honestly hadn't spotted the release notes. Agreed, the changelog is unnecessary. FYI this example still doesn't work yet. Still needs more tweaks. (The same was true when we wrote the paper, it took me like 2 months of tweaking to get it working. Hopefully we can do it quicker now, but this isn't surprising.) |
Thanks for the update, @patrick-kidger ! I'm assuming the example works now. I actually have further thoughts about this gan model, though I'm quite busy before the xmax break and couldn't really afford to context switch between different things. Would you mind if I go through this PR in a week of time? Aside, build for Py3.6 seems to be failing due to numpy being incompatible after its latest release. |
Yup, the example works now! The most important points were to change the initialisation and to switch from Adam to SGD. I'd be very interested to hear your thoughts on SDEs/GANs. After xmas is fine; likewise no rush on this PR. It looks like SDE-GANs are probably going to be in ICML rather than ICLR, after all... Aside, eurgh. Looks like the problem is that it's grabbing the SciPy pre-release, which requires Python 3.7+. |
Not related to this thread in particular, but I found this during late night reading. Some food for thought regarding the log-ODE scheme. |
Ah, thankyou! I've actually come across this before. Unfortunately I don't think it composes with At some point I expect we'll revisit the log-ODE method, as I believe James wants to write a paper on it. When that happens one option would be to write a for loop over the vjp, in C++ parallelised with OpenMP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the example! I've only read the code but have not played with the code, so these mostly are just questions. Are there ways to simplify this example and potentially speed up running it (e.g. by reducing the time horizon)?
examples/sde_gan.py
Outdated
################### | ||
init_noise = torch.randn(batch_size, self._initial_noise_size, device=ts.device) | ||
x0 = self._initial(init_noise) | ||
# TODO: step size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These step sizes seem quite large. What happens if we use a smaller step size? Is it just slower, or does it tend to not work as well?
In particular, this step size seems to be larger than the step size for generating the data. This seems a bit weird, unless I'm missing something obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just slower. The step size used for generating the data needn't be connected to the step size we use in our model. (We don't output the data every step; only every 10 steps.)
This actually touches on what I've found to be the single biggest issue with neural SDEs: training them is really, really slow. Even with this example, which:
- is a simple toy dataset, actually generated from an SDE;
- has a large batch size;
- has a huge step size;
- isn't making adaptive steps;
- is backprop'ing through the solver, without the adjoint method;
- is on a decent GPU;
- has the benefit of having been tweaked for a few months now.
Even this still takes a substantial amount of time to run.
I'm not really sure what the solution is, to be honest. (Lots of GPUs would be one way.) Large step sizes are just one way of ameliorating this -- I suspect the prior on model space is pretty similar to that of small step sizes.
A few of your comments above are about the optimisation procedure. This is easily the thing that actually seems to matter most. I didn't try every combination of SGD/Adam/Adadelta/... with every combination of weight decay, SWA, etc, but as a general summary of my observations:
Judging from the GAN optimisation literature at the moment, it seems like no-one else has much idea either. SWA is obviously related to the stronger convergence guarantees that Cesaro means often get / is a trick to avoid the diffusive regime at the end of training. Weight decay and negative momentum seem related to the idea of solving a GAN as an ODE, but I suspect that's almost certainly suboptimal for min-max games, just like it's suboptimal for minimisation problems. All in all things seem to be in a bit of a mess. It feels to me like all the basic pieces are there in the literature, and someone just needs to figure out the correct things to put together to get it working reliably. As it stands this example represents a "good enough" trade-off between readability / efficacy / use of only standard library without custom optimisers. Things I'd like to try, but decided were out of scope for an intentionally simple example:
If you have any thoughts on the matter I'd be very interested. |
FYI even though I think this PR is done, I'm thinking of leaving it unmerged until the paper is on arXiv (so we have a better link than just the workshop paper). |
Finally done with this PR and ready to merge it, I think. @lxuechen how does it look? |
Looks good. If you could rebase from master to rid the test failures, then I think it's mostly ready for merge; otherwise I can try do the rebase some time over the weekend. Overall, the example seems a somewhat more complicated than the others, but understandably training GANs isn't easy. Sorry again for this ultra slow response. |
No worries at all. The test failure looks to be because the version number isn't updated. I can bump the number if you really want, but as we're only touching the README and a new example file then I think it should be fine to just merge anyway. I've just done a small update to start using Fire, and add the license at the top. |
Added an SDE-GAN example. Only a draft as I'm not satisfied that it's working yet / that it's as fast as it could be.
I've added a couple other things in here too. For one, I've added a CHANGELOG.txt file. (Anything I've left out?)
I've also tweaked the citation request with the new paper, to reflect the huge amount of work we've put into this repo for that paper.