Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading extension module slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0... #47

Open
weiqiang13 opened this issue Aug 16, 2024 · 8 comments

Comments

@weiqiang13
Copy link

{'verbose': True, 'with_cuda': True, 'extra_ldflags': ['-L/home/junlong/anaconda3/envs/xlstm/lib', '-lcublas'], 'extra_cflags': ['-DSLSTM_HIDDEN_SIZE=128', '-DSLSTM_BATCH_SIZE=8', '-DSLSTM_NUM_HEADS=4', '-DSLSTM_NUM_STATES=4', '-DSLSTM_DTYPE_B=float', '-DSLSTM_DTYPE_R=nv_bfloat16', '-DSLSTM_DTYPE_W=nv_bfloat16', '-DSLSTM_DTYPE_G=nv_bfloat16', '-DSLSTM_DTYPE_S=nv_bfloat16', '-DSLSTM_DTYPE_A=float', '-DSLSTM_NUM_GATES=4', '-DSLSTM_SIMPLE_AGG=true', '-DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false', '-DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0', '-DSLSTM_FORWARD_CLIPVAL_VALID=false', '-DSLSTM_FORWARD_CLIPVAL=0.0', '-U__CUDA_NO_HALF_OPERATORS', '-U__CUDA_NO_HALF_CONVERSIONS', '-U__CUDA_NO_BFLOAT16_OPERATORS', '-U__CUDA_NO_BFLOAT16_CONVERSIONS', '-U__CUDA_NO_BFLOAT162_OPERATORS__', '-U__CUDA_NO_BFLOAT162_CONVERSIONS__'], 'extra_cuda_cflags': ['-Xptxas="-v"', '-gencode', 'arch=compute_80,code=compute_80', '-res-usage', '--use_fast_math', '-O3', '-Xptxas -O3', '--extra-device-vectorization', '-DSLSTM_HIDDEN_SIZE=128', '-DSLSTM_BATCH_SIZE=8', '-DSLSTM_NUM_HEADS=4', '-DSLSTM_NUM_STATES=4', '-DSLSTM_DTYPE_B=float', '-DSLSTM_DTYPE_R=nv_bfloat16', '-DSLSTM_DTYPE_W=nv_bfloat16', '-DSLSTM_DTYPE_G=nv_bfloat16', '-DSLSTM_DTYPE_S=nv_bfloat16', '-DSLSTM_DTYPE_A=float', '-DSLSTM_NUM_GATES=4', '-DSLSTM_SIMPLE_AGG=true', '-DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false', '-DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0', '-DSLSTM_FORWARD_CLIPVAL_VALID=false', '-DSLSTM_FORWARD_CLIPVAL=0.0', '-U__CUDA_NO_HALF_OPERATORS', '-U__CUDA_NO_HALF_CONVERSIONS', '-U__CUDA_NO_BFLOAT16_OPERATORS', '-U__CUDA_NO_BFLOAT16_CONVERSIONS', '-U__CUDA_NO_BFLOAT162_OPERATORS__', '-U__CUDA_NO_BFLOAT162_CONVERSIONS__']}
Using /home/junlong/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/junlong/.cache/torch_extensions/py311_cu121/slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0/build.ninja...
Building extension module slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0...
How to solve this problem?

@Ninlawat-Puhu
Copy link

I encountered the same problem.

@Ninlawat-Puhu
Copy link

@kpoeppel Hi, Could you advise how to solve ?

@vivianawoo
Copy link

me too,how to solve it~~

@matiashaggman
Copy link

I have this problem aswell. This only occurs if you include the sLSTM module in the xLSTM stack, using only mLSTM works. I tested this on the lightning platform with NVIDIA L4 GPU.

@calliope-pro
Copy link

Same here

@f-krause
Copy link

Same problem here (linux, ubuntu)!

A "work around" is to set backend="vanilla" in sLSTMLayerConfig, however this will ofc result in very slow learning

@kpoeppel
Copy link
Collaborator

Is it stuck in loading? Because I see no error in what you shared. If there is a loading error of the module you can clear your torch_extensions cache (typically $HOME/.cache/torch_extensions).
In any case, make sure that your GPU has compute capability >= 8.0 (Ampere). This is needed for bfloat16.

@f-krause
Copy link

f-krause commented Oct 10, 2024

Same problem here (linux, ubuntu)!

I made it work on my machine by changing the ninja version.

This config of versions currently works for me (though training is super slow on an A100 compared to transformers/mamba/torch lstm):

  • Ubuntu
  • Python 3.10.14
  • cuda 11.8
  • cudatoolkit=11.8.0
  • cudatoolkit-dev=11.7.0
  • pytorch 2.4.1
  • ninja 1.11.1.1
  • gcc 11.2.0
  • gxx_impl_linux-64 11.2.0
  • gxx_linux-64 11.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants