Memory issue due to A and B matrix computation #25

anupsingh15 · 2024-03-11T05:09:24Z

Hi,
Thanks for providing the Mamba implementation. I would like to know if there is any workaround in the efficient computation of deltaA and deltaB_u that can avoid the GPU memory running out issue. The following are the parameters I used to create the Mamba instance:

d_model: 1024
n_layer: 4   
d_state: int = 1024
expand: int = 2

The other parameters are set to their default values.

It results in a model of ~60M parameters. However, I run out of memory (max GPU memory= 24 GB) when I train with a batch size of 256 or even as low as 64 and this probably happens due to large matrix computations for deltaA and deltaB_u.

The text was updated successfully, but these errors were encountered:

shim0114 · 2024-03-15T08:35:01Z

I also have this issue...!

johnma2006 · 2024-03-26T15:10:46Z

This repo is mostly meant for educational purpose, and I would suggest using the official repo to do any training: https://github.com/state-spaces/mamba

XZJIsme · 2024-03-31T16:31:38Z

I also met this OOM problem lately, but not when using this repo's codes.
You may refer to this question on stackoverflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory issue due to A and B matrix computation #25

Memory issue due to A and B matrix computation #25

anupsingh15 commented Mar 11, 2024 •

edited

Loading

shim0114 commented Mar 15, 2024

johnma2006 commented Mar 26, 2024

XZJIsme commented Mar 31, 2024 •

edited

Loading

Memory issue due to A and B matrix computation #25

Memory issue due to A and B matrix computation #25

Comments

anupsingh15 commented Mar 11, 2024 • edited Loading

shim0114 commented Mar 15, 2024

johnma2006 commented Mar 26, 2024

XZJIsme commented Mar 31, 2024 • edited Loading

anupsingh15 commented Mar 11, 2024 •

edited

Loading

XZJIsme commented Mar 31, 2024 •

edited

Loading