-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading extension module slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0... #47
Comments
I encountered the same problem. |
@kpoeppel Hi, Could you advise how to solve ? |
me too,how to solve it~~ |
I have this problem aswell. This only occurs if you include the sLSTM module in the xLSTM stack, using only mLSTM works. I tested this on the lightning platform with NVIDIA L4 GPU. |
Same here |
Same problem here (linux, ubuntu)! A "work around" is to set backend="vanilla" in sLSTMLayerConfig, however this will ofc result in very slow learning |
Is it stuck in loading? Because I see no error in what you shared. If there is a loading error of the module you can clear your torch_extensions cache (typically $HOME/.cache/torch_extensions). |
I made it work on my machine by changing the ninja version. This config of versions currently works for me (though training is super slow on an A100 compared to transformers/mamba/torch lstm):
|
{'verbose': True, 'with_cuda': True, 'extra_ldflags': ['-L/home/junlong/anaconda3/envs/xlstm/lib', '-lcublas'], 'extra_cflags': ['-DSLSTM_HIDDEN_SIZE=128', '-DSLSTM_BATCH_SIZE=8', '-DSLSTM_NUM_HEADS=4', '-DSLSTM_NUM_STATES=4', '-DSLSTM_DTYPE_B=float', '-DSLSTM_DTYPE_R=nv_bfloat16', '-DSLSTM_DTYPE_W=nv_bfloat16', '-DSLSTM_DTYPE_G=nv_bfloat16', '-DSLSTM_DTYPE_S=nv_bfloat16', '-DSLSTM_DTYPE_A=float', '-DSLSTM_NUM_GATES=4', '-DSLSTM_SIMPLE_AGG=true', '-DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false', '-DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0', '-DSLSTM_FORWARD_CLIPVAL_VALID=false', '-DSLSTM_FORWARD_CLIPVAL=0.0', '-U__CUDA_NO_HALF_OPERATORS', '-U__CUDA_NO_HALF_CONVERSIONS', '-U__CUDA_NO_BFLOAT16_OPERATORS', '-U__CUDA_NO_BFLOAT16_CONVERSIONS', '-U__CUDA_NO_BFLOAT162_OPERATORS__', '-U__CUDA_NO_BFLOAT162_CONVERSIONS__'], 'extra_cuda_cflags': ['-Xptxas="-v"', '-gencode', 'arch=compute_80,code=compute_80', '-res-usage', '--use_fast_math', '-O3', '-Xptxas -O3', '--extra-device-vectorization', '-DSLSTM_HIDDEN_SIZE=128', '-DSLSTM_BATCH_SIZE=8', '-DSLSTM_NUM_HEADS=4', '-DSLSTM_NUM_STATES=4', '-DSLSTM_DTYPE_B=float', '-DSLSTM_DTYPE_R=nv_bfloat16', '-DSLSTM_DTYPE_W=nv_bfloat16', '-DSLSTM_DTYPE_G=nv_bfloat16', '-DSLSTM_DTYPE_S=nv_bfloat16', '-DSLSTM_DTYPE_A=float', '-DSLSTM_NUM_GATES=4', '-DSLSTM_SIMPLE_AGG=true', '-DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false', '-DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0', '-DSLSTM_FORWARD_CLIPVAL_VALID=false', '-DSLSTM_FORWARD_CLIPVAL=0.0', '-U__CUDA_NO_HALF_OPERATORS', '-U__CUDA_NO_HALF_CONVERSIONS', '-U__CUDA_NO_BFLOAT16_OPERATORS', '-U__CUDA_NO_BFLOAT16_CONVERSIONS', '-U__CUDA_NO_BFLOAT162_OPERATORS__', '-U__CUDA_NO_BFLOAT162_CONVERSIONS__']}
Using /home/junlong/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/junlong/.cache/torch_extensions/py311_cu121/slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0/build.ninja...
Building extension module slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0...
How to solve this problem?
The text was updated successfully, but these errors were encountered: