SDE GAN example #80

patrick-kidger · 2020-10-24T13:46:58Z

Added an SDE-GAN example. Only a draft as I'm not satisfied that it's working yet / that it's as fast as it could be.

I've added a couple other things in here too. For one, I've added a CHANGELOG.txt file. (Anything I've left out?)
I've also tweaked the citation request with the new paper, to reflect the huge amount of work we've put into this repo for that paper.

lxuechen · 2020-10-27T05:37:41Z

Thanks for the update. Is the changelog necessary, as I've been documenting things in the release notes? I'll take a look at the other stuff tomorrow.

patrick-kidger · 2020-10-27T09:21:05Z

Haha, I honestly hadn't spotted the release notes. Agreed, the changelog is unnecessary.

FYI this example still doesn't work yet. Still needs more tweaks. (The same was true when we wrote the paper, it took me like 2 months of tweaking to get it working. Hopefully we can do it quicker now, but this isn't surprising.)

lxuechen · 2020-12-14T16:13:27Z

Thanks for the update, @patrick-kidger ! I'm assuming the example works now.

I actually have further thoughts about this gan model, though I'm quite busy before the xmax break and couldn't really afford to context switch between different things.

Would you mind if I go through this PR in a week of time?

Aside, build for Py3.6 seems to be failing due to numpy being incompatible after its latest release.

patrick-kidger · 2020-12-14T17:02:20Z

Yup, the example works now! The most important points were to change the initialisation and to switch from Adam to SGD.

I'd be very interested to hear your thoughts on SDEs/GANs. After xmas is fine; likewise no rush on this PR. It looks like SDE-GANs are probably going to be in ICML rather than ICLR, after all...

Aside, eurgh. Looks like the problem is that it's grabbing the SciPy pre-release, which requires Python 3.7+.

lxuechen · 2020-12-16T05:21:48Z

Not related to this thread in particular, but I found this during late night reading. Some food for thought regarding the log-ODE scheme.

patrick-kidger · 2020-12-16T12:20:28Z

Ah, thankyou! I've actually come across this before. Unfortunately I don't think it composes with autograd.grad in the desired way though -- from one of PyTorch devs: "The current vmap prototype unfortunately does not support doing a batch vjp where the input and the v vector share a batch dimension."

At some point I expect we'll revisit the log-ODE method, as I believe James wants to write a paper on it. When that happens one option would be to write a for loop over the vjp, in C++ parallelised with OpenMP.

lxuechen

Thanks for the example! I've only read the code but have not played with the code, so these mostly are just questions. Are there ways to simplify this example and potentially speed up running it (e.g. by reducing the time horizon)?

examples/sde_gan.py

lxuechen · 2020-12-21T04:41:01Z

examples/sde_gan.py

+        ###################
+        init_noise = torch.randn(batch_size, self._initial_noise_size, device=ts.device)
+        x0 = self._initial(init_noise)
+        # TODO: step size


These step sizes seem quite large. What happens if we use a smaller step size? Is it just slower, or does it tend to not work as well?

In particular, this step size seems to be larger than the step size for generating the data. This seems a bit weird, unless I'm missing something obvious.

It's just slower. The step size used for generating the data needn't be connected to the step size we use in our model. (We don't output the data every step; only every 10 steps.)

This actually touches on what I've found to be the single biggest issue with neural SDEs: training them is really, really slow. Even with this example, which:

is a simple toy dataset, actually generated from an SDE;

has a large batch size;

has a huge step size;

isn't making adaptive steps;

is backprop'ing through the solver, without the adjoint method;

is on a decent GPU;

has the benefit of having been tweaked for a few months now.

Even this still takes a substantial amount of time to run.

I'm not really sure what the solution is, to be honest. (Lots of GPUs would be one way.) Large step sizes are just one way of ameliorating this -- I suspect the prior on model space is pretty similar to that of small step sizes.

examples/sde_gan.py

patrick-kidger · 2020-12-22T20:21:32Z

A few of your comments above are about the optimisation procedure. This is easily the thing that actually seems to matter most.

I didn't try every combination of SGD/Adam/Adadelta/... with every combination of weight decay, SWA, etc, but as a general summary of my observations:

Adam/AMSgrad give losses that jump around all over the show during training, whilst Adadelta gives very well-behaved smoothly-varying losses. However only SGD manages to get anything approaching good convergence towards the end of training: the others tend to get stuck at some loss away from zero. (Probably related to the theoretical lack-of-convergence that these optimisers exhibit.)
Weight decay seems to help a little bit.
SWA can save you from the oscillating losses that Adam gives you, helps a fair chunk with SGD, but does basically nothing with Adadelta. I've only tried uniform weight averaging, not EMA.

Judging from the GAN optimisation literature at the moment, it seems like no-one else has much idea either. SWA is obviously related to the stronger convergence guarantees that Cesaro means often get / is a trick to avoid the diffusive regime at the end of training. Weight decay and negative momentum seem related to the idea of solving a GAN as an ODE, but I suspect that's almost certainly suboptimal for min-max games, just like it's suboptimal for minimisation problems.

All in all things seem to be in a bit of a mess. It feels to me like all the basic pieces are there in the literature, and someone just needs to figure out the correct things to put together to get it working reliably.

As it stands this example represents a "good enough" trade-off between readability / efficacy / use of only standard library without custom optimisers.

Things I'd like to try, but decided were out of scope for an intentionally simple example:

Negative momentum. Unfortunately the default PyTorch optimizers throw an error if you try it.
Adabound, which interpolates between Adam and SGD. Something interpolating between Adadelta and SGD would also be interesting.
Using Adadelta to begin training and then switch to SGD at the end.
Coming up with some custom optimiser that has the right blend of the above ideas!

If you have any thoughts on the matter I'd be very interested.

patrick-kidger · 2020-12-24T15:41:50Z

FYI even though I think this PR is done, I'm thinking of leaving it unmerged until the paper is on arXiv (so we have a better link than just the workshop paper).

patrick-kidger · 2021-02-09T09:24:58Z

Finally done with this PR and ready to merge it, I think. @lxuechen how does it look?

lxuechen · 2021-02-24T05:40:45Z

Finally done with this PR and ready to merge it, I think. @lxuechen how does it look?

Looks good. If you could rebase from master to rid the test failures, then I think it's mostly ready for merge; otherwise I can try do the rebase some time over the weekend. Overall, the example seems a somewhat more complicated than the others, but understandably training GANs isn't easy.

Sorry again for this ultra slow response.

patrick-kidger · 2021-02-24T16:47:52Z

No worries at all.

The test failure looks to be because the version number isn't updated. I can bump the number if you really want, but as we're only touching the README and a new example file then I think it should be fine to just merge anyway.

I've just done a small update to start using Fire, and add the license at the top.

patrick-kidger added 7 commits October 20, 2020 13:51

Partially added SDE-GAN example. Updated citation request.

d1ebd65

Completed example

20a3301

example updates

39d0d94

Merge branch 'master' into dev-sde-gan-example

02ad7ba

bumped version; added changelog

d4eb98b

tweaked lr

e7129fd

tweaks and fixes

becdd79

patrick-kidger requested a review from lxuechen October 24, 2020 13:47

google-cla bot added the cla: yes label Oct 24, 2020

patrick-kidger added 8 commits November 8, 2020 12:25

tweaks

ea681e5

Merge branch 'master' into dev-sde-gan-example

ea118b9

tweaks

0416ffd

numerous improvements. kind of working now!

5ddeed5

it works!

c06d969

Adjusted optimisation

419591b

working

b5109a9

removed changelog; kept version the same

96f8c7a

lxuechen reviewed Dec 21, 2020

View reviewed changes

patrick-kidger added 2 commits December 22, 2020 18:56

moved everything to GPU by default

f6415ed

removed old comments

4b0fe73

patrick-kidger changed the base branch from master to dev December 24, 2020 03:50

Base automatically changed from dev to master January 5, 2021 12:06

patrick-kidger added 2 commits February 9, 2021 09:23

Updated and done!

7bbe038

Merge branch 'master' into dev-sde-gan-example

bb1447d

update comments; use Fire

a59936e

patrick-kidger marked this pull request as ready for review February 24, 2021 16:47

Bump version.

56ac9d5

lxuechen approved these changes Feb 26, 2021

View reviewed changes

lxuechen merged commit f965bc9 into master Feb 26, 2021

patrick-kidger deleted the dev-sde-gan-example branch February 26, 2021 22:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDE GAN example #80

SDE GAN example #80

patrick-kidger commented Oct 24, 2020

lxuechen commented Oct 27, 2020

patrick-kidger commented Oct 27, 2020

lxuechen commented Dec 14, 2020

patrick-kidger commented Dec 14, 2020

lxuechen commented Dec 16, 2020

patrick-kidger commented Dec 16, 2020 •

edited

Loading

lxuechen left a comment

lxuechen Dec 21, 2020

patrick-kidger Dec 22, 2020 •

edited

Loading

patrick-kidger commented Dec 22, 2020 •

edited

Loading

patrick-kidger commented Dec 24, 2020

patrick-kidger commented Feb 9, 2021 •

edited

Loading

lxuechen commented Feb 24, 2021

patrick-kidger commented Feb 24, 2021

SDE GAN example #80

SDE GAN example #80

Conversation

patrick-kidger commented Oct 24, 2020

lxuechen commented Oct 27, 2020

patrick-kidger commented Oct 27, 2020

lxuechen commented Dec 14, 2020

patrick-kidger commented Dec 14, 2020

lxuechen commented Dec 16, 2020

patrick-kidger commented Dec 16, 2020 • edited Loading

lxuechen left a comment

Choose a reason for hiding this comment

lxuechen Dec 21, 2020

Choose a reason for hiding this comment

patrick-kidger Dec 22, 2020 • edited Loading

Choose a reason for hiding this comment

patrick-kidger commented Dec 22, 2020 • edited Loading

patrick-kidger commented Dec 24, 2020

patrick-kidger commented Feb 9, 2021 • edited Loading

lxuechen commented Feb 24, 2021

patrick-kidger commented Feb 24, 2021

patrick-kidger commented Dec 16, 2020 •

edited

Loading

patrick-kidger Dec 22, 2020 •

edited

Loading

patrick-kidger commented Dec 22, 2020 •

edited

Loading

patrick-kidger commented Feb 9, 2021 •

edited

Loading