Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDXL seems to not train self_attn layers in Text Encoders #1952

Open
Nekotekina opened this issue Feb 25, 2025 · 3 comments · May be fixed by #1964
Open

SDXL seems to not train self_attn layers in Text Encoders #1952

Nekotekina opened this issue Feb 25, 2025 · 3 comments · May be fixed by #1964

Comments

@Nekotekina
Copy link

Nekotekina commented Feb 25, 2025

Hello, I noticed that the recent version only trains MLP in Text Encoders, whereas existing LoRAs or LoRAs trained with GUI version of kohya-ss (that uses older version) seem to train all layers. Is it a mistake on my side? I couldn't find any option to control it.

This is what usually gets trained:

lora_te1_text_model_encoder_layers_0_mlp_fc1.alpha
lora_te1_text_model_encoder_layers_0_mlp_fc1.lora_down.weight
lora_te1_text_model_encoder_layers_0_mlp_fc1.lora_up.weight
lora_te1_text_model_encoder_layers_0_mlp_fc2.alpha
lora_te1_text_model_encoder_layers_0_mlp_fc2.lora_down.weight
lora_te1_text_model_encoder_layers_0_mlp_fc2.lora_up.weight
lora_te1_text_model_encoder_layers_0_self_attn_k_proj.alpha
lora_te1_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight
lora_te1_text_model_encoder_layers_0_self_attn_k_proj.lora_up.weight
lora_te1_text_model_encoder_layers_0_self_attn_out_proj.alpha
lora_te1_text_model_encoder_layers_0_self_attn_out_proj.lora_down.weight
lora_te1_text_model_encoder_layers_0_self_attn_out_proj.lora_up.weight
lora_te1_text_model_encoder_layers_0_self_attn_q_proj.alpha
lora_te1_text_model_encoder_layers_0_self_attn_q_proj.lora_down.weight
lora_te1_text_model_encoder_layers_0_self_attn_q_proj.lora_up.weight
lora_te1_text_model_encoder_layers_0_self_attn_v_proj.alpha
lora_te1_text_model_encoder_layers_0_self_attn_v_proj.lora_down.weight
lora_te1_text_model_encoder_layers_0_self_attn_v_proj.lora_up.weight

This is what I see in my attempts to use newest version of sdxl_train_network.py:

lora_te1_text_model_encoder_layers_0_mlp_fc1.alpha
lora_te1_text_model_encoder_layers_0_mlp_fc1.lora_down.weight
lora_te1_text_model_encoder_layers_0_mlp_fc1.lora_up.weight
lora_te1_text_model_encoder_layers_0_mlp_fc2.alpha
lora_te1_text_model_encoder_layers_0_mlp_fc2.lora_down.weight
lora_te1_text_model_encoder_layers_0_mlp_fc2.lora_up.weight
@AbstractEyes
Copy link

AbstractEyes commented Feb 25, 2025

I've been primarily using locon for full finetunes or unet, and lokr for attention as they've yielded the best results without the need for regularization.
Only problem is you have to delete the metadata_cache and let it verify the cache, if you want them to work after moving folders or directories after drive caching latents; because there's some odd quirk with locon not using the correct directory when finding images; and I haven't bothered to fix it.
Until this problem is fixed that's an option.

@Nekotekina
Copy link
Author

@AbstractEyes Hello, is it really a problem what I'm observing? The TE still gets trained. Maybe I can fix it myself.

@kohya-ss
Copy link
Owner

Thank you for reporting. I will check it sooner.

Nekotekina added a commit to Nekotekina/sd-scripts that referenced this issue Mar 1, 2025
Should fix kohya-ss#1952
I added alternative name for CLIPAttention.
I have no idea why this name changed.
Now it should accept both names.
@Nekotekina Nekotekina linked a pull request Mar 1, 2025 that will close this issue
Nekotekina added a commit to Nekotekina/sd-scripts that referenced this issue Mar 1, 2025
Should fix kohya-ss#1952
I added alternative name for CLIPAttention.
I have no idea why this name changed.
Now it should accept both names.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants