You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Thanks for providing the Mamba implementation. I would like to know if there is any workaround in the efficient computation of deltaA and deltaB_u that can avoid the GPU memory running out issue. The following are the parameters I used to create the Mamba instance:
d_model: 1024
n_layer: 4
d_state: int = 1024
expand: int = 2
The other parameters are set to their default values.
It results in a model of ~60M parameters. However, I run out of memory (max GPU memory= 24 GB) when I train with a batch size of 256 or even as low as 64 and this probably happens due to large matrix computations for deltaA and deltaB_u.
The text was updated successfully, but these errors were encountered:
This repo is mostly meant for educational purpose, and I would suggest using the official repo to do any training: https://github.com/state-spaces/mamba
Hi,
Thanks for providing the Mamba implementation. I would like to know if there is any workaround in the efficient computation of
deltaA
anddeltaB_u
that can avoid the GPU memory running out issue. The following are the parameters I used to create the Mamba instance:The other parameters are set to their default values.
It results in a model of ~60M parameters. However, I run out of memory (max GPU memory= 24 GB) when I train with a batch size of 256 or even as low as 64 and this probably happens due to large matrix computations for
deltaA
anddeltaB_u
.The text was updated successfully, but these errors were encountered: