Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable flash attention #5

Open
s-ryosky opened this issue Dec 6, 2024 · 3 comments
Open

Enable flash attention #5

s-ryosky opened this issue Dec 6, 2024 · 3 comments

Comments

@s-ryosky
Copy link

s-ryosky commented Dec 6, 2024

Thank you for sharing your codes!

I am trying to apply flash attention to MHA as well as StreamPETR, but a loss becomes NaN during the training.
Have you ever encountered this phenomenon?
And do you know of a solution to this problem?

@AlmoonYsl
Copy link
Owner

@s-ryosky Hi,
This problem is caused by the fp16 calculation used in FlashAttention .You can try to use memory-efficient attention implemented by the xformers for fp32 calculation in attention.

@s-ryosky
Copy link
Author

s-ryosky commented Dec 9, 2024

@AlmoonYsl
Thank you for your reply. I'll check it.

However, I don't think that object-wise position embedding itself is incompatible with flash-attention.
Compared to StreamPETR, there seems to be differences regarding inverse sigmoid and pos2posemb3d used during position embedding.
Do you think this is one of the reasons for the training instability?

@AlmoonYsl
Copy link
Owner

@AlmoonYsl Thank you for your reply. I'll check it.

However, I don't think that object-wise position embedding itself is incompatible with flash-attention. Compared to StreamPETR, there seems to be differences regarding inverse sigmoid and pos2posemb3d used during position embedding. Do you think this is one of the reasons for the training instability?

I think there may be a numerical overflow problem when calculating the position embedding and its gradient in fp16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants