Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we have a "train_native.py"? #1947

Open
6DammK9 opened this issue Feb 21, 2025 · 0 comments
Open

Should we have a "train_native.py"? #1947

6DammK9 opened this issue Feb 21, 2025 · 0 comments

Comments

@6DammK9
Copy link

6DammK9 commented Feb 21, 2025

This issus is majorly focusing on code structure.

Currently I'm working on porting resume from assigned epoch / iter and bundled validation loss to sdxl_train.py, to enable massive "native full finetune" in my SDXL base model.
I'm currently working in sd3 "WIP branch".

I have found that many (basic / fundamental / general) features are implemented in class NetworkTrainer, which is no accessible in *_train.py.

Meanwhile ARB / latent cache related configurations, even the implementation itself, I have made my own scalable version of prepare_buckets_latents.py, and made a huge latent dataset, realizing that I am close to have invalidate configuration because of the inconsistent magic numbers in verify_bucket_reso_steps.
Moreover, the super() will amplify this issue if we use downsteram applications / extensions, such as LyCORIS's "full bypass", which may hide the stack trace and the actual code dependency.

Examining the newer coding structures shared in train_*.py, maybe we should have a train_naive.py to unify the implementation diifference spreaded across arch specific *_train.py.

Any actions able to mitigate this risk will be greatly appreciated.

PS: accelerator.skip_first_batches in sdxl_train.py "soon".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant