Releases: bghira/SimpleTuner
v0.9.6.2 mixture-of-experts training
What's Changed
Mixture-of-Experts
Mixture-of-Experts training complete with a brief tutorial on how to accelerate your training and start producing mind-blowing results.
- DeepSpeed fix (#424)
- Parquet backend fixes for different dataset sources
- Parquet backend JSON / JSONL support
- Updated check for aspect ratio mismatch to be more reliable by @bghira in #427
- minor bugfixes for sd2.x/controlnet/sdxl refiner training by @bghira in #428
- mixture-of-experts training via segmind models by @bghira in #429
Full Changelog: v0.9.6.1...v0.9.6.2
v0.9.6.1
What's Changed
- remove info log line by @bghira in #418
- blip3: resume captioning an input file and only caption files that have not yet been
- parquet backend: resolve retrieval of width/height from series columns
- documentation: improve phrasing for more inclusivity by @bghira in #419
- toolkit: new captioners, new captioning features for blip3
- parquet backend: better debug logging
- honor DELETE_PROBLEMATIC_IMAGES in the VAE cache backend when a read fails by @bghira in #420
- multigpu fixes
- cuda: update nvidia libs to cuda 12.1 / torch 2.3
- validations: noise scheduler wasn't being configured by @bghira in #422
- randomised bucketing should correct the intermediary size in a special way to ease the pain of implementation by @bghira in #423
- debiased bucket training should rebuild cache upon epoch end (implements #416) by @bghira in #424
- Fix retrieval of parquet captions when not using AWS backend by @bghira in #425
- parquet backend improvements and rebuilding buckets/vae cache on each epoch for randomised bucketing by @bghira in #426
Full Changelog: v0.9.6...v0.9.6.1
v0.9.6 - debias them buckets
debiased aspect bucketing
When training on large datasets of heterogenous samples, you will discover a content bias among aspect ratios - vertical images contain portraits, widescreen shots are cinematic, square images tend to be more artistic.
A new feature, crop_aspect=random
is introduced in an attempt to combat the issue. A known issue in the implementation (#416) limits the usefulness for small datasets, but in its current state is capable of de-biasing very-large datasets.
What's Changed
- prompt library: rewrite all prompts, focusing on concept diversity and density, reducing 'sameness' complaints of prompt library
- logging: reduce logspam in
INFO
log level - aspect bucketing: ability to randomise aspect buckets without distorting the images (experimental)
- validations: ability to disable uncond generation for a slight speed-up on slow hardware when not necessary
- aspect bucketing: ability to customise the aspect resolution mappings and enforce the resolutions you wish to train on
- captioning toolkit: new scripts for gemini-pro-vision, paligemma 3B and BLIP3
- bugfix: dataloader metadata retrieval would occasionally return the wrong values if filenames match across multiple datasets
A majority of the changes were merged via #417
Full Changelog: v0.9.5.4...v0.9.6
v0.9.5.4 - controlnet training
What's Changed
Experimental ControlNet training support.
- invalidate bad caches when they fail to load by @bghira in #406
- controlnet training support (sdxl+sd2x) by @bghira in #407
- huggingface hub: skip errors when uploading model for SD 2.x and SDXL trainers by @bghira in #410
Full Changelog: v0.9.5.3c...v0.9.5.4
v0.9.5.3c img2img validations for the sdxl refiner
What's Changed
- sdxl refiner: option
--sdxl_refiner_uses_full_range
for training on all timesteps by @bghira in #401 - sdxl refiner: ability to validate on images using 20% denoise strength
- deepfloyd: stage II eval fixes
- factory should sleep when waiting for text embed write by @bghira in #402
- fixes #351 by adding --variant option by @bghira in #403
- more toolkit options for captioning: gemini pro, blip3 by @bghira in #404
Full Changelog: v0.9.5.3b...v0.9.5.3c
v0.9.5.3b - SDXL refiner training
What's Changed
- SDXL refiner training support - LoRA and full u-net. Can't reuse the text embeds from the base model, you must use a different directory.
- validations: completely refactored workflow
- huggingface hub: now can use
--push_checkpoints_to_hub
to upload all intermediary checkpoints - dropout: improve implementation to bypass any issues with tokenizer setup that might result in an incorrect embed. by @bghira in #388
- lr schedules: polynomial fixed / last_epoch being set correctly for the rest
- parquet backend will ignore missing captions
- deepfloyd: text encoder loading fixed
- sd2.x: tested, bugfixed. uncond text embeds excluded from zeroing
- huggingface hub: improved model card, --push_checkpoints_to_hub will push every saved model and validation image (tested with 168 validation prompts)
- mps: new pytorch nightly, resolves some strange issues before.
- mps: use 'auto' slice width for sd 2.x instead of null
- validations: refactored logic entirely, cleaned up and simplified to tie-in with huggingface hub uploader
- timestep schedule is segmented by train_batch_size now, ensuring we hit a more broad distribution of timestep sampling for each mini-batch by @bghira in #391
- follow-up fixes from botched v0.9.5.3 build by @bghira in #397
Full Changelog: v0.9.5.2...v0.9.5.3b
v0.9.5.2 - hugging face hub upload fixes/improvements, minor vae encoding fixes
What's Changed
- huggingface hub model upload improvement / fixes
- validations double-run fix
- json backend image size microconditioning input fix (SDXL) by @bghira in #385
- bitfit restrictions / model freezing simplification
- updates to huggingface hub integration, automatically push model card and weights
- webhooks: minor log level fixes, other improvements. ability to debug image cropping by sending them to discord.
- resize and crop fixes for json and parquet backend edge cases (VAE encode in-flight) by @bghira in #386
Full Changelog: v0.9.5.1...v0.9.5.2
v0.9.5.1
v0.9.5 - now with more robust flavour
Finetuning Terminus XL Velocity v2
What's Changed
- New cropping logic is now working across the board for parquet/json backends. Images are always cropped now, even when
cropped=false
, if necessary to maintain 8px or 64px alignment with the resulting dataset.- Resulting image sizes and aspect ratios did not change for
resolution_type=area
- Resulting image sizes and aspect ratios did change for
resolution_type=pixel
- This was necessary to avoid stretching/squeezing images when aligning to 64px interval
- Resulting image sizes and aspect ratios did not change for
- Discord webhook support, see the TUTORIAL for information.
- "Sensible defaults" are now set for
minimum_image_size
,maximum_image_size
, andtarget_downsample_size
to avoid unexpected surprises mostly when usingcrop=true
, but also for some benefits when usingcrop=false
as well. - Image upscaling restrictions have been relaxed, but it will refuse to upscale an image beyond 25%, and instead asks you to change the dataset configuration values.
- Image quality when training SDXL models has substantially improved thanks to the minimisation of the microconditioning input ranges:
Finetuning a particularly poorly-performing Terminus checkpoint with reduced high frequency patterning - Single subject dreambooth was benchmarked on SDXL with 30 diverse images, achieving great results in just 500 steps.
Commits
- Convert image to accepted format for calculate_luminance by @Beinsezii in #376
- vae cache fix for SDXL / legacy SD training
- epoch / resume step fix for a corner case where the path to the training data includes the dataset name by @bghira in #377
- when crop=false, we will crop from the intermediary size to the target size instead of squishing
- set default min_image_size, maximum_image_size, and target_downsample_size values to 100%, 150%, and 150% of the value set for resolution by @bghira in #378
- resolved bugged-out null embed when dropout is disabled
- discord webhook support
- cuda/rocm: bugfix for eval on final legacy (sd 1.5/2.1) training validations
- avoid stretching/squeezing images by always cropping to maintain 8/64px alignment
- set default values for minimum_image_size, maximum_image_size, and target_downsample_size by @bghira in #379
Full Changelog: v0.9.5-beta...v0.9.5-beta2
v0.9.5-beta - optimized training, 3x speed-up
What's Changed
This release includes an experimental rewrite of the image handling code. Please report any issues.
- flexible pixel resize to 8 or 64 px alignment, no more rounding up where unnecessary by @bghira in #368
- more deepfloyd stage II fixes for model evaluation by @bghira in #369
- AMD/ROCm support by @Beinsezii in #373
- TrainingSample: refactor and encapsulate image handling, improving performance and reliability by @bghira in #374
- fix --aspect_bucket_rounding not being applied correctly
- rebuild image sample handling to be structured object-oriented logic
- fix early epoch exit problem
- max epochs vs max steps ambiguity reduced by setting default to 0 for one of them
- fixes for LoRA text encoder save/load hooks
- optimise trainer
- 300% performance gain by removing the torch anomaly detector
- fix dataset race condition where a single image dataset was not being detected
- AMD documentation for install, dependencies thanks to Beinsezii
- fix for wandb timestep distribution chart values racing ahead of reality by @bghira in #375
Full Changelog: v0.9.4...v0.9.5-beta