You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that in the PixArtMSBlock implementation, there is no normalization layer for cross-attention, while normalization layers exist for self-attention and MLP:
self.norm1=nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6) # for self-attentionself.attn=AttentionKVCompress(...)
self.cross_attn=MultiHeadCrossAttention(...) # no norm layer before/afterself.norm2=nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6) # for MLP
I'm curious about the reasoning behind not using normalization for cross-attention, while having it for self-attention and MLP layers. What's the rationale for this architectural design?
Thanks for this great work!
The text was updated successfully, but these errors were encountered:
binbinsh
changed the title
Question about missing normalization layer for cross-attention in PixArtMSBlock
Why doesn't cross-attention use normalization in PixArtMSBlock?
Jan 11, 2025
I noticed that in the PixArtMSBlock implementation, there is no normalization layer for cross-attention, while normalization layers exist for self-attention and MLP:
https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py#L541:
I'm curious about the reasoning behind not using normalization for cross-attention, while having it for self-attention and MLP layers. What's the rationale for this architectural design?
Thanks for this great work!
The text was updated successfully, but these errors were encountered: