Replies: 8 comments 33 replies
-
you'll probably need to use a quantised base model which allows you to experiment with adaptive optimisers like prodigy, dadaptation, and adafactor (though we just really use it like a more efficient adamw here) |
Beta Was this translation helpful? Give feedback.
-
@billnye2 how are you running the FLUX prompts with the new LORA after you trained it? |
Beta Was this translation helpful? Give feedback.
-
Think I finally got the kinks out, it's training now. Odd thing is my config/config.env file isn't being picked up so I had to do everything as cmd line options if anyone has any troubleshooting tips there.
I'm running on an Nvidia 6000 ADA 48GB GPU, at batch size 1, I'm getting around 10-12seconds per iteration. ~30GB Of VRAM used. One thing I can't figure out is I have 76 sample images but it seems to only pick 19 of them
|
Beta Was this translation helpful? Give feedback.
-
Could be a fluke, but FWIW I'm seeing better results after making a couple changes in SimpleTuner:
This is with LR 2e-4, batch size 1, grad accum steps 1, rank 32, AdamW8Bit, sine LR schedule with 2000 steps period, set up for 10k steps total but I'm already seeing better results by 1300 steps than in several thousand steps on previous runs. LoRA, not DoRA, since Comfy can't load the DoRAs last I checked. |
Beta Was this translation helpful? Give feedback.
-
Hey! This could be a simple mistake I've made in my settings bash train.sh subprocess.CalledProcessError: Command '['/workspace/SimpleTuner/.venv/bin/python', 'train.py', '--model_type=lora', '--pretrained_model_name_or_path=black-forest-labs/FLUX.1-dev', '--gradient_checkpointing', '--set_grads_to_none', '--gradient_accumulation_steps=1', '--resume_from_checkpoint=latest', '--snr_gamma=5', '--data_backend_config=config/multidatabackend.json', '--num_train_epochs=0', '--max_train_steps=30000', '--metadata_update_interval=65', '--use_8bit_adam', '--learning_rate=1e-5', '--lr_scheduler=constant', '--seed', '42', '--lr_warmup_steps=300', '--output_dir=output/models', '--inference_scheduler_timestep_spacing=trailing', '--training_scheduler_timestep_spacing=trailing', '--allow_tf32', '--mixed_precision=bf16', '--base_model_precision=int8-quanto', '--flux', '--train_batch=1', '--max_workers=32', '--read_batch_size=25', '--write_batch_size=64', '--caption_dropout_probability=0.1', '--torch_num_threads=8', '--image_processing_batch_size=32', '--vae_batch_size=4', '--validation_prompt=a handsome young girl leaning at the wall, wearing a red jacket', '--num_validation_images=1', '--validation_num_inference_steps=20', '--validation_seed=42', '--minimum_image_size=1024', '--resolution=1024', '--validation_resolution=1024', '--resolution_type=pixel', '--checkpointing_steps=100', '--checkpoints_total_limit=3', '--validation_steps=50', '--tracker_run_name=flux-winterwonderland', '--tracker_project_name=lora-training', '--validation_guidance=3.5', '--validation_guidance_rescale=0.0', '--validation_negative_prompt=']' returned non-zero exit status 2. |
Beta Was this translation helpful? Give feedback.
-
Here are some config.env recommendations. Prodigy optimizer is currently the easiest option since it doesn't require guessing a reasonable learning rate.
adamw_bf16 optimizer is faster and uses less VRAM but requires guessing a good learning rate. Here are some settings I'm currently testing:
The default setting People with more than 24 GB of VRAM can try running without quantization (remove Don't be alarmed if you see a few bad validation images. The training may still recover. Update: If you want to increase |
Beta Was this translation helpful? Give feedback.
-
a bunch of updates have happened. if you're running a new lora, it may be better to start with the new defaults in mind.
to match x-flux trainer:
|
Beta Was this translation helpful? Give feedback.
-
So I've been testing my results and most of my loras turned out quite bad. Or so I thought? But doing some more extended testing, I think the big issue is that we might be breaking the cfg distillation, so regular cfg has to be introduced back in during sampling. With that the quality of my lora outputs increased quite a bit. |
Beta Was this translation helpful? Give feedback.
-
Just starting this so people can contribute what they've learned for optimal hyperparameters etc. for training flux loras.
For myself, using 10 images at 1e-3 or 1e-4 learning rate learns a little bit, but burns heavily at 500-1000 steps, surprisingly it fluctuates every few hundred steps between being so burned and generating something coherent. 1e-7 learning rate didn't really have any change after 1000 steps
Beta Was this translation helpful? Give feedback.
All reactions