-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pytorch 1.6-1.8 compatability - CUDA11/3090 ready #92
base: master
Are you sure you want to change the base?
Conversation
Jerry do you install you use a nightly binary for your Pytorch? https://discuss.pytorch.org/t/rtx-3000-support/98158 I have built this in a docker container using Nvidia's base image of CUDA11.1 then using the pip command in the link to install pytorch compiled with the RTX3000 support and it seems to work well (@MatthewHowe what base image did you use?) From some googling it looks like it could also be conflicting versions of different nvidia packages, nvcc, cudnn, ect |
Thanks, @XDynames . And under your suggestion about I should use nightly binary, so I use the pip command: And my pytorch version looks like now: And now I got a ninja related error like this: I will double check all the NVIDIA packages, and find a way to solve the ninja problem. |
I used this [docker image]docker pull nvidia/cuda:11.1-devel-ubuntu18.04 - installed conda then torch-nightly. |
Hi, @MatthewHowe Thanks for your great abvice! I double checked my CUDA installation, and nvcc settings. After proper set those environment variables. It won't cause the correspond errors like ['nvcc', '-v']. While on the contrary, ninja still have report an error about the FAIL in 'THCudaBlas_SgemmBatched'. The log is like this: FAILED: /home/liurui/DCNv2/build/temp.linux-x86_64-3.7/home/liurui/DCNv2/DCN/src/cuda/dcn_v2_cuda.o Sorry. I FIX this problem by degrading my pytorch 1.8 nightly binary to 1.7 stable version. Because the THCudaBlas_SgemmBatched is modified in recent version, so it caused this problem. It work will, and compile successfully. AND Thanks for Matthew‘s great work again!! |
Just looked into this and ATEN lost this definition on the 13NOV..... Maybe we should look into replacing SgemmBatched with a non deprecated version for 1.8 support? |
the same error |
you can try downgrade pytorch version to 1.7 stable, it work fine with me. |
|
I have compiled successfully using pytorch1.7. Thanks. @jerryhitit @MatthewHowe |
I successfully compiled on Windows 10, CUDA 11.1 (RTX3090), and PyTorch 1.7. Thank you so much! |
@MatthewHowe Hi Matthew, I failed to compile using pytorch1.7 with RuntimeError: Error compiling objects for extension. I used the latest version of you which supports pytorch1.7 gcc 7.5.0 torch.cuda.is_available return True and CUDA home is not None Error Message: The above exception was the direct cause of the following exception: Traceback (most recent call last): Could you help me? |
Double check that your versions all line up - if you want to use CUDA 10.2 make sure CUDNN is the correct version and the pytorch binary you are using is compiled with CUDA 10.2 |
Hi @XDynames , I solved this by modifying my python interrupter file "anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py" But I met another gcc compile problem. running install Do you have any advice? The environment is the same. |
@MatthewHowe Hi MatthewHowe. Thanks for your great job, I successfully compiled on Ubuntu18.04.5, CUDA 11.1 (RTX3090), and PyTorch 1.7. 0 . |
@ConnerWK Not to put a fine point on it but the code for DCN has become a bit messy - what we have done was to replae low level BLAS & CUDABLAS function calls with a higher level ATEN equivalent This is viewed by us as a band-aid, so we've started working on a pure pytorch NN.module based solution that will not require compiling. Currently we have deformable convolution V1/2 passing all the unit tests from this code but have yet to break ground on ROI pooling Let me know if this is something you'd be interested in |
This comment has been minimized.
This comment has been minimized.
Can you solve this problem, I have compiled successfully in cuda11,pytorch1.7(RTX 3090), thank u very @MatthewHowe error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device -- Process 0 terminated with the following error: import torch ConvolutionParams |
do you solve this problem? I find the same issue too . |
pytorch version1.7 stable make return error as follow: |
I use this version https://github.com/lbin/DCNv2, THCState_getCurrentStream" is undefined solved. |
Is there a solution for compliling this branch for PyTorch = 1.8 and CUDA = 11.1 (from torch.version.cuda)? |
@hhcs9527 Not yet, we have a version of deformable convolution - not ROI pooling that does work with those versions but it is currently not working well in multi GPU training (very slow) |
meet the same error like u |
Finally build successfully!!! |
try this one https://github.com/jinfagang/DCNv2_latest
…------------------ 原始邮件 ------------------
发件人: "CharlesShang/DCNv2" ***@***.***>;
发送时间: 2021年3月24日(星期三) 晚上6:54
***@***.***>;
***@***.******@***.***>;
主题: Re: [CharlesShang/DCNv2] Pytorch 1.6-1.8 compatability - CUDA11/3090 ready (#92)
@rathaROG commented on this pull request.
In DCN/src/cpu/dcn_v2_cpu.cpp:
> @@ -1,5 +1,6 @@ #include <vector>
@haruishi43 Thanks for your reply! Can you help verify this? This is what I did:
git clone https://github.com/CharlesShang/DCNv2.git cd DCNv2 git remote add tteepe https://github.com/tteepe/DCNv2.git git fetch tteepe git checkout origin/master python setup.py build develop
I also made some changes in dcn_v2_im2col_cuda.cu and dcn_v2_psroi_pooling_cuda.cu:
ceil() to ceilf()
floor() to floorf()
round() to roundf()
My system: windows 10, cuda 11.1.1, cudnn 8.1.1.33, anaconda python 3.6.12 with these packages:
Cython @ file:///C:/ci/cython_1614014892888/work cython-bbox==0.1.3 torch==1.8.0 torchaudio==0.8.0 torchvision==0.9.0
All errors:
cpu\dcn_v2_cpu.cpp(82): error C3861: 'THFloatBlas_gemm': identifier not found cpu\dcn_v2_cpu.cpp(101): error C3861: 'THFloatBlas_gemm': identifier not found cpu\dcn_v2_cpu.cpp(176): error C3861: 'THFloatBlas_gemm': identifier not found cpu\dcn_v2_cpu.cpp(216): error C3861: 'THFloatBlas_gemm': identifier not found cpu\dcn_v2_cpu.cpp(224): error C3861: 'THFloatBlas_gemv': identifier not found cuda/dcn_v2_cuda.cu(107): error: identifier "THCState_getCurrentStream" is undefined cuda/dcn_v2_cuda.cu(126): error: identifier "THCudaBlas_SgemmBatched" is undefined cuda/dcn_v2_cuda.cu(273): error: identifier "THCudaBlas_Sgemm" is undefined cuda/dcn_v2_cuda.cu(279): error: identifier "THCState_getCurrentStream" is undefined cuda/dcn_v2_cuda.cu(324): error: identifier "THCudaBlas_Sgemv" is undefined
What did I miss?
Please help me. I really want to make it work.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
upgrade ur vs to 2019
…------------------ 原始邮件 ------------------
发件人: "CharlesShang/DCNv2" ***@***.***>;
发送时间: 2021年3月25日(星期四) 晚上7:31
***@***.***>;
***@***.******@***.***>;
主题: Re: [CharlesShang/DCNv2] Pytorch 1.6-1.8 compatability - CUDA11/3090 ready (#92)
@rathaROG commented on this pull request.
In DCN/src/cpu/dcn_v2_cpu.cpp:
> @@ -1,5 +1,6 @@ #include <vector>
Hi @haruishi43, I still had one more problem:
[4/4] "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\amd64/link.exe" dcn_v2_cuda.o dcn_v2_cpu.o dcn_v2_im2col_cpu.o dcn_v2_psroi_pooling_cpu.o dcn_v2_cuda.cuda.o dcn_v2_im2col_cuda.cuda.o dcn_v2_psroi_pooling_cuda.cuda.o /nologo /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda_cu.lib ***@***.***@at@@***@***.***@***@***.***_N1@Z torch_cuda_cpp.lib ***@***.***@at@@yahxz torch.lib /LIBPATH:C:\dev\exc\Anaconda3\envs\DEFT\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\dev\exc\Anaconda3\envs\DEFT\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib/x64" cudart.lib /out:DCNv2_gpu.pyd FAILED: DCNv2_gpu.pyd "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\amd64/link.exe" dcn_v2_cuda.o dcn_v2_cpu.o dcn_v2_im2col_cpu.o dcn_v2_psroi_pooling_cpu.o dcn_v2_cuda.cuda.o dcn_v2_im2col_cuda.cuda.o dcn_v2_psroi_pooling_cuda.cuda.o /nologo /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda_cu.lib ***@***.***@at@@***@***.***@***@***.***_N1@Z torch_cuda_cpp.lib ***@***.***@at@@yahxz torch.lib /LIBPATH:C:\dev\exc\Anaconda3\envs\DEFT\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\dev\exc\Anaconda3\envs\DEFT\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib/x64" cudart.lib /out:DCNv2_gpu.pyd Creating library DCNv2_gpu.lib and object DCNv2_gpu.exp MSVCRT.lib(loadcfg.obj) : error LNK2001: unresolved external symbol __enclave_config DCNv2_gpu.pyd : fatal error LNK1120: 1 unresolved externals ninja: build stopped: subcommand failed.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Thanks for the clue! I already had the latest version of VS2019, and I realized that I didn't add the path of cl.exe of VS2019 in the system path variable. In case you're interested, I also made a windows-ready repo here: |
Thanks for the repo 👍------------------ 原始邮件 ------------------
***@***.***>
发送时间: 2021年3月26日(星期五) 上午6:46
***@***.***>;
***@***.******@***.***>;
主题: Re: [CharlesShang/DCNv2] Pytorch 1.6-1.8 compatability - CUDA11/3090 ready (#92)
|
I have the same issue and it was fixed by the following steps:
|
@JohnPekl have you tried running |
@haruishi43 , I haven't tried running |
I have downgrade pytorch version to 1.7 stable,but it doesn't work for me. |
Hi @MatthewHowe I appreciate your work! Do you have specific commands to compile |
hello,I met the same error as yours ,could you explain how to use conda install for more details? I recreate the envs and reinstall the pytorch : conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch, |
My system specs:
In my conda environment I
And if I am upgrading the pytorch version then its failing to build only any solutions, suggestions much appreciated Thank you for your time |
seems ok after using https://github.com/tteepe/DCNv2 |
Hello, Could you please share your version of pytorch cudatoolkit gcc-v nvcc -v information?I'm suffering this problem for command 'g++' failed with exit status 1 and it drives me crazy @ @XDynames |
can't fix when i downgrade my pytorch to 1.7.0 stable. sad.... |
您好,已经收到您的邮件,我会尽快给您回复。
|
|
How did you modify it |
How did you solve this problem? |
Excuse me, how did you compile it successfully? Did you limit the computing power of the graphics card to 75? Can you help me? |
I forked from https://github.com/MatthewHowe/DCNv2 which fixes to be compatible with [torch 1.7 / 1.8 with cuda 10 / 11] My environment: |
Hi, I met same error(compute_86) "torch 1.7 / 1.8 with cuda 10 / 11" . +) Fix In my case, i don't used |
Hi, can someone help me to see why I get the following error here? error: command '/usr/local/cuda-11.0/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/bin/nvcc' failed: No such file or directory: '/usr/local/cuda-11.0/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/bin/nvcc' The full error message is as follows: |
您好,已经收到您的邮件,我会尽快给您回复。
|
3Q, wonderful work |
您好,已经收到您的邮件,我会尽快给您回复。
|
Modified from pull request from @half-potato for compatibility with torch 1.6. Replaced THBlas functions with aten tensor functions.
Tested for torch 1.7 and 1.8 with cuda 10 and 11.
Worked with RTX2080 and RTX3090.
@XDynames