Should we have a "train_native.py"? #1947

6DammK9 · 2025-02-21T04:58:22Z

This issus is majorly focusing on code structure.

Currently I'm working on porting resume from assigned epoch / iter and bundled validation loss to sdxl_train.py, to enable massive "native full finetune" in my SDXL base model.
I'm currently working in sd3 "WIP branch".

I have found that many (basic / fundamental / general) features are implemented in class NetworkTrainer, which is no accessible in *_train.py.

Meanwhile ARB / latent cache related configurations, even the implementation itself, I have made my own scalable version of prepare_buckets_latents.py, and made a huge latent dataset, realizing that I am close to have invalidate configuration because of the inconsistent magic numbers in verify_bucket_reso_steps.
Moreover, the super() will amplify this issue if we use downsteram applications / extensions, such as LyCORIS's "full bypass", which may hide the stack trace and the actual code dependency.

Examining the newer coding structures shared in train_*.py, maybe we should have a train_naive.py to unify the implementation diifference spreaded across arch specific *_train.py.

Any actions able to mitigate this risk will be greatly appreciated.

PS: accelerator.skip_first_batches in sdxl_train.py "soon".

The text was updated successfully, but these errors were encountered:

6DammK9 mentioned this issue Feb 23, 2025

PoC: Rewrite fine_tune.py as train_native.py #1950

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we have a "train_native.py"? #1947

Should we have a "train_native.py"? #1947

6DammK9 commented Feb 21, 2025

Should we have a "train_native.py"? #1947

Should we have a "train_native.py"? #1947

Comments

6DammK9 commented Feb 21, 2025