Exploring new optimī optimizers #819

mhirki · 2024-08-19T20:38:04Z

mhirki
Aug 19, 2024

I've been playing around with the new optimī optimizers. One of the new optimizers is Adan which is supposed to be superior to AdamW while using more VRAM:

https://optimi.benjaminwarner.dev/optimizers/adan/

The Adan optimizer requires a higher learning rate. I ended up setting it to 1e-3 which is 2.5 times higher than with AdamW. As expected, Adan is clearly converging faster. Given how slow Flux training is, Adan seems to be worth considering.

Steps	adamw_bf16	optimi-adan
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500
6000

The LoRAs are available on the Huggingface Hub:
https://huggingface.co/mikaelh/flux-sanna-marin-v0.4-fp8-adamw-stochastic
https://huggingface.co/mikaelh/flux-sanna-marin-v0.4-fp8-adan

The exact config.env settings can also be found on the Hub.

bghira · 2024-08-19T21:52:52Z

bghira
Aug 19, 2024
Maintainer

oh cool! that's a pretty nice test. makes me think it could be useful for jamming more creativity into the model

0 replies

mhirki · 2024-08-21T08:34:08Z

mhirki
Aug 21, 2024
Author

I added Lion to the mix:

https://huggingface.co/mikaelh/flux-sanna-marin-v0.4-fp8-lion

The challenge with Lion is that it converges extremely fast. I ended up dropping learning rate by 10x, increasing weight decay by 10x, increasing warmup steps to 400 and switching to polynomial schedule.

The final validation image at 6000 steps is looking good:

adamw_bf16	optimi-adan	optimi-lion

I did a little 3-way prompt comprehension test. AdamW seems to be the winner with Lion also performing well. Adan is the worst so maybe I need to try adjusting some parameters.

sanna marin playing tennis

adamw_bf16	optimi-adan	optimi-lion

sanna marin running a marathron

adamw_bf16	optimi-adan	optimi-lion

sanna marin sitting on a plane

adamw_bf16	optimi-adan	optimi-lion

sanna marin opening a can in the kitchen

adamw_bf16	optimi-adan	optimi-lion

sanna marin petting a cat

adamw_bf16	optimi-adan	optimi-lion

0 replies

mhirki · 2024-08-22T08:35:44Z

mhirki
Aug 22, 2024
Author

I managed to improve results with Adan by setting beta3 to 0.999 (instead of 0.99). Adan is still producing results which are different from AdamW and Lion with the same seed.

https://huggingface.co/mikaelh/flux-sanna-marin-v0.4-fp8-adan2

I added three new prompts to the comparison.

sanna marin playing tennis

adamw_bf16	optimi-adan	optimi-lion

sanna marin running a marathon

adamw_bf16	optimi-adan	optimi-lion

sanna marin sitting on a plane

adamw_bf16	optimi-adan	optimi-lion

sanna marin opening a can in the kitchen

adamw_bf16	optimi-adan	optimi-lion

sanna marin petting a cat

adamw_bf16	optimi-adan	optimi-lion

sanna marin catching a red ball

adamw_bf16	optimi-adan	optimi-lion

sanna marin picking up a strawberry

adamw_bf16	optimi-adan	optimi-lion

sanna marin picking up a strawberry

adamw_bf16	optimi-adan	optimi-lion

1 reply

mhirki Aug 22, 2024
Author

Somewhat interestingly both Adan and Lion seem to have finished with a lower training loss than AdamW. AdamW's loss was very low around 3000 steps though.

adamw_bf16	optimi-adan	optimi-lion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploring new optimī optimizers #819

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Exploring new optimī optimizers #819

mhirki Aug 19, 2024

Replies: 3 comments · 1 reply

bghira Aug 19, 2024 Maintainer

mhirki Aug 21, 2024 Author

mhirki Aug 22, 2024 Author

mhirki Aug 22, 2024 Author

mhirki
Aug 19, 2024

Replies: 3 comments 1 reply

bghira
Aug 19, 2024
Maintainer

mhirki
Aug 21, 2024
Author

mhirki
Aug 22, 2024
Author

mhirki Aug 22, 2024
Author