Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch 1.6-1.8 compatability - CUDA11/3090 ready #92

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

MatthewHowe
Copy link

Modified from pull request from @half-potato for compatibility with torch 1.6. Replaced THBlas functions with aten tensor functions.
Tested for torch 1.7 and 1.8 with cuda 10 and 11.
Worked with RTX2080 and RTX3090.
@XDynames

@MatthewHowe
Copy link
Author

#90 #89 #88 #74

@jerryhitit
Copy link

Hi! Matthew,
I have a RTX3090, and cloned your project that you modified 14 hours ago.
While ./make.sh still get the error about:
nvcc fatal : Unsupported gpu architecture 'compute_86'
image

I got a Ubuntu 18.04, CUDA 11.1 pytorch 1.7, and gcc 7.5.0 / g++ 7.5.0

I guess it's probably the CUDA caused error, ANY HELP WOULD BE APPRECIATED!!

@XDynames
Copy link

XDynames commented Nov 14, 2020

Jerry do you install you use a nightly binary for your Pytorch? https://discuss.pytorch.org/t/rtx-3000-support/98158

I have built this in a docker container using Nvidia's base image of CUDA11.1 then using the pip command in the link to install pytorch compiled with the RTX3000 support and it seems to work well (@MatthewHowe what base image did you use?)

From some googling it looks like it could also be conflicting versions of different nvidia packages, nvcc, cudnn, ect

@jerryhitit
Copy link

Thanks, @XDynames .
I used to got a pytorch from
pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

And under your suggestion about I should use nightly binary, so I use the pip command:
pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html

And my pytorch version looks like now:
image

And now I got a ninja related error like this:
image
result in the RuntimeError: Error compiling objects for extension

I will double check all the NVIDIA packages, and find a way to solve the ninja problem.
Thanks AGAIN!

@MatthewHowe
Copy link
Author

MatthewHowe commented Nov 14, 2020

I used this [docker image]docker pull nvidia/cuda:11.1-devel-ubuntu18.04 - installed conda then torch-nightly.
I then cloned and compiled DCNv2. This could be an issue with Cuda11.0 or some other conflicting packages.
When DCN doesn't compile usually the error from the cause is above your screen cap - if you run the ./make again the compiled parts will not run and it will make it clearer what is causing the issue.

@jerryhitit
Copy link

jerryhitit commented Nov 15, 2020

Hi, @MatthewHowe Thanks for your great abvice!

I double checked my CUDA installation, and nvcc settings. After proper set those environment variables. It won't cause the correspond errors like ['nvcc', '-v'].

While on the contrary, ninja still have report an error about the FAIL in 'THCudaBlas_SgemmBatched'.
It seems to be a new problem.

The log is like this:

FAILED: /home/liurui/DCNv2/build/temp.linux-x86_64-3.7/home/liurui/DCNv2/DCN/src/cuda/dcn_v2_cuda.o
/usr/local/cuda-11.1/bin/nvcc -DWITH_CUDA -I/home/liurui/DCNv2/DCN/src -I/home/liurui/anaconda3/envs/FairMOT/lib/python3.7/site-packages/torch/include -I/home/liurui/anaconda3/envs/FairMOT/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/liurui/anaconda3/envs/FairMOT/lib/python3.7/site-packages/torch/include/TH -I/home/liurui/anaconda3/envs/FairMOT/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.1/include -I/home/liurui/anaconda3/envs/FairMOT/include/python3.7m -c -c /home/liurui/DCNv2/DCN/src/cuda/dcn_v2_cuda.cu -o /home/liurui/DCNv2/build/temp.linux-x86_64-3.7/home/liurui/DCNv2/DCN/src/cuda/dcn_v2_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=sm_80 -ccbin g++ -std=c++14
/home/liurui/DCNv2/DCN/src/cuda/dcn_v2_cuda.cu(126): error: identifier "THCudaBlas_SgemmBatched" is undefined

Sorry. I FIX this problem by degrading my pytorch 1.8 nightly binary to 1.7 stable version. Because the THCudaBlas_SgemmBatched is modified in recent version, so it caused this problem.

It work will, and compile successfully.

AND Thanks for Matthew‘s great work again!!

@XDynames
Copy link

XDynames commented Nov 15, 2020

Just looked into this and ATEN lost this definition on the 13NOV.....

Maybe we should look into replacing SgemmBatched with a non deprecated version for 1.8 support?
pytorch/pytorch#47987

@Shank2358
Copy link

pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html

the same error

@jerryhitit
Copy link

pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html

the same error

you can try downgrade pytorch version to 1.7 stable, it work fine with me.

@Shank2358
Copy link

pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html

the same error

you can try downgrade pytorch version to 1.7 stable, it work fine with me.
Thank you. I will try it again.

@Shank2358
Copy link

I have compiled successfully using pytorch1.7. Thanks. @jerryhitit @MatthewHowe

@duanzhiihao
Copy link

I successfully compiled on Windows 10, CUDA 11.1 (RTX3090), and PyTorch 1.7. Thank you so much!

@KiedaTamashi
Copy link

@MatthewHowe Hi Matthew, I failed to compile using pytorch1.7 with RuntimeError: Error compiling objects for extension.

I used the latest version of you which supports pytorch1.7
My environment (using anaconda virtual env):
image

gcc 7.5.0
ninja 1.10.2
ubuntu 18.04
python 3.7
pytorch 1.7
cudatoolkit 10.2

torch.cuda.is_available return True and CUDA home is not None

Error Message:
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1522, in _run_ninja_build
env=env)
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/subprocess.py", line 481, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "setup.py", line 69, in
cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 653, in build_extensions
build_ext.build_extensions(self)
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/distutils/command/build_ext.py", line 448, in build_extensions
self._build_extensions_serial()
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/distutils/command/build_ext.py", line 473, in _build_extensions_serial
self.build_extension(ext)
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
_build_ext.build_extension(self, ext)
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 482, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1238, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/NAS/home01/tanzhenwei/anaconda3/envs/torch17/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1538, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

Could you help me?

@XDynames
Copy link

XDynames commented Dec 4, 2020

Double check that your versions all line up - if you want to use CUDA 10.2 make sure CUDNN is the correct version and the pytorch binary you are using is compiled with CUDA 10.2

@KiedaTamashi
Copy link

Double check that your versions all line up - if you want to use CUDA 10.2 make sure CUDNN is the correct version and the pytorch binary you are using is compiled with CUDA 10.2

Hi @XDynames , I solved this by modifying my python interrupter file "anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py"

But I met another gcc compile problem.

running install
running bdist_egg
running egg_info
writing DCNv2.egg-info/PKG-INFO
writing dependency_links to DCNv2.egg-info/dependency_links.txt
writing top-level names to DCNv2.egg-info/top_level.txt
reading manifest file 'DCNv2.egg-info/SOURCES.txt'
writing manifest file 'DCNv2.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building '_ext' extension
Emitting ninja build file /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
1.10.2
g++ -pthread -shared -B /NAS/home01/tanzhenwei/anaconda3/envs/py37/compiler_compat -L/NAS/home01/tanzhenwei/anaconda3/envs/py37/lib -Wl,-rpath=/NAS/home01/tanzhenwei/anaconda3/envs/py37/lib -Wl,--no-as-needed -Wl,--sysroot=/ /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/vision.o /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cpu/dcn_v2_cpu.o /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cpu/dcn_v2_im2col_cpu.o /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cpu/dcn_v2_psroi_pooling_cpu.o /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cuda/dcn_v2_cuda.o /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.o /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cuda/dcn_v2_psroi_pooling_cuda.o -L/NAS/home01/tanzhenwei/anaconda3/envs/py37/lib/python3.7/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.7/_ext.cpython-37m-x86_64-linux-gnu.so
g++: error: /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cuda/dcn_v2_cuda.o: No such file or directory
g++: error: /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.o: No such file or directory
g++: error: /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cuda/dcn_v2_psroi_pooling_cuda.o: No such file or directory
error: command 'g++' failed with exit status 1

Do you have any advice? The environment is the same.

@ConnerWK
Copy link

@MatthewHowe Hi MatthewHowe. Thanks for your great job, I successfully compiled on Ubuntu18.04.5, CUDA 11.1 (RTX3090), and PyTorch 1.7. 0 .
For there still some packages need to be compiled manually. I wonder if there are some guidelines , principles or rules to modify the source code from CUDA10(even earlier versions) version to CUDA 11 version so that I can compiled it with current environment. Though I browsed the files changed, i still have no idea about how to do it properly.
Would you mind provide some guidance? Looking forward for your reply.

@XDynames
Copy link

XDynames commented Dec 17, 2020

@ConnerWK Not to put a fine point on it but the code for DCN has become a bit messy - what we have done was to replae low level BLAS & CUDABLAS function calls with a higher level ATEN equivalent

This is viewed by us as a band-aid, so we've started working on a pure pytorch NN.module based solution that will not require compiling. Currently we have deformable convolution V1/2 passing all the unit tests from this code but have yet to break ground on ROI pooling

Let me know if this is something you'd be interested in

@ConnerWK

This comment has been minimized.

@WangJian981002
Copy link

Can you solve this problem, I have compiled successfully in cuda11,pytorch1.7(RTX 3090), thank u very @MatthewHowe

error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1607370156314/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=700 : an illegal memory access was encountered
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL error in: /opt/conda/conda-bld/pytorch_1607370156314/work/torch/lib/c10d/../c10d/NCCLUtils.hpp:136, unhandled cuda error, NCCL version 2.7.8
error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device
/opt/conda/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 33 leaked semaphores to clean up at shutdown
len(cache))
Traceback (most recent call last):
File "C_ddp.py", line 349, in
main()
File "C_ddp.py", line 109, in main
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/wj/Detection/CenterNetV2/C_ddp.py", line 277, in main_worker
center_loss, center_fuse_loss, scale_loss, offset_loss = model({'img':img , 'label':label , 'heatmap_t':heatmap_t , 'hm_FuseClass_t':hm_FuseClass_t})
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wj/Detection/CenterNetV2/nets/resnet_dcn_model.py", line 35, in forward
out=self.backbone(x)[0]
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wj/Detection/CenterNetV2/networks/resnet_dcn.py", line 261, in forward
x = self.deconv_layers(x)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 929, in forward
output_padding, self.groups, self.dilation)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([32, 256, 40, 40], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(256, 256, kernel_size=[4, 4], padding=[1, 1], stride=[2, 2], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams
data_type = CUDNN_DATA_FLOAT
padding = [1, 1, 0]
stride = [2, 2, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = true
allow_tf32 = true
input: TensorDescriptor 0x55923cf1b4e0
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 32, 256, 40, 40,
strideA = 409600, 1600, 40, 1,
output: TensorDescriptor 0x55923cf1d1b0
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 32, 256, 20, 20,
strideA = 102400, 400, 20, 1,
weight: FilterDescriptor 0x55923cf4cec0
type = CUDNN_DATA_FLOAT
tensor_format = CUDNN_TENSOR_NCHW
nbDims = 4
dimA = 256, 256, 4, 4,
Pointer addresses:
input: 0x7fcf70a80000
output: 0x7fcf6fe00000
weight: 0x7fd1d1700000

@WangJian981002
Copy link

Double check that your versions all line up - if you want to use CUDA 10.2 make sure CUDNN is the correct version and the pytorch binary you are using is compiled with CUDA 10.2

Hi @XDynames , I solved this by modifying my python interrupter file "anaconda3/envs/py37/lib/python3.7/site-packages/torch/utils/cpp_extension.py"

But I met another gcc compile problem.

running install
running bdist_egg
running egg_info
writing DCNv2.egg-info/PKG-INFO
writing dependency_links to DCNv2.egg-info/dependency_links.txt
writing top-level names to DCNv2.egg-info/top_level.txt
reading manifest file 'DCNv2.egg-info/SOURCES.txt'
writing manifest file 'DCNv2.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building '_ext' extension
Emitting ninja build file /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
1.10.2
g++ -pthread -shared -B /NAS/home01/tanzhenwei/anaconda3/envs/py37/compiler_compat -L/NAS/home01/tanzhenwei/anaconda3/envs/py37/lib -Wl,-rpath=/NAS/home01/tanzhenwei/anaconda3/envs/py37/lib -Wl,--no-as-needed -Wl,--sysroot=/ /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/vision.o /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cpu/dcn_v2_cpu.o /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cpu/dcn_v2_im2col_cpu.o /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cpu/dcn_v2_psroi_pooling_cpu.o /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cuda/dcn_v2_cuda.o /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.o /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cuda/dcn_v2_psroi_pooling_cuda.o -L/NAS/home01/tanzhenwei/anaconda3/envs/py37/lib/python3.7/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.7/_ext.cpython-37m-x86_64-linux-gnu.so
g++: error: /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cuda/dcn_v2_cuda.o: No such file or directory
g++: error: /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.o: No such file or directory
g++: error: /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/build/temp.linux-x86_64-3.7/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2/DCN/src/cuda/dcn_v2_psroi_pooling_cuda.o: No such file or directory
error: command 'g++' failed with exit status 1

Do you have any advice? The environment is the same.

do you solve this problem? I find the same issue too .

@sparkfax
Copy link

pytorch version1.7 stable
gcc 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
CUDA Version: 11.0
I can run pytorch on other project, so pytorch and cuda version should match.

make return error as follow:
Emitting ninja build file /home/opt/mot/DCNv2/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
FAILED: /home/opt/mot/DCNv2/build/temp.linux-x86_64-3.8/home/opt/mot/DCNv2/src/cuda/dcn_v2_cuda.o
/usr/local/cuda/bin/nvcc -DWITH_CUDA -I/home/opt/mot/DCNv2/src -I/root/anaconda3/envs/FairMOT/lib/python3.8/site-packages/torch/include -I/root/anaconda3/envs/FairMOT/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/anaconda3/envs/FairMOT/lib/python3.8/site-packages/torch/include/TH -I/root/anaconda3/envs/FairMOT/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/root/anaconda3/envs/FairMOT/include/python3.8 -c -c /home/opt/mot/DCNv2/src/cuda/dcn_v2_cuda.cu -o /home/opt/mot/DCNv2/build/temp.linux-x86_64-3.8/home/opt/mot/DCNv2/src/cuda/dcn_v2_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -ccbin g++ -std=c++14
/home/opt/mot/DCNv2/src/cuda/dcn_v2_cuda.cu(107): error: identifier "THCState_getCurrentStream" is undefined
/home/opt/mot/DCNv2/src/cuda/dcn_v2_cuda.cu(279): error: identifier "THCState_getCurrentStream" is undefined
/home/opt/mot/DCNv2/src/cuda/dcn_v2_cuda.cu(324): error: identifier "THCudaBlas_Sgemv" is undefined
3 errors detected in the compilation of "/tmp/tmpxft_0011fd41_00000000-6_dcn_v2_cuda.cpp1.ii".

@sparkfax
Copy link

pytorch version1.7 stable
gcc 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
CUDA Version: 11.0
I can run pytorch on other project, so pytorch and cuda version should match.

make return error as follow:
Emitting ninja build file /home/opt/mot/DCNv2/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
FAILED: /home/opt/mot/DCNv2/build/temp.linux-x86_64-3.8/home/opt/mot/DCNv2/src/cuda/dcn_v2_cuda.o
/usr/local/cuda/bin/nvcc -DWITH_CUDA -I/home/opt/mot/DCNv2/src -I/root/anaconda3/envs/FairMOT/lib/python3.8/site-packages/torch/include -I/root/anaconda3/envs/FairMOT/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/root/anaconda3/envs/FairMOT/lib/python3.8/site-packages/torch/include/TH -I/root/anaconda3/envs/FairMOT/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/root/anaconda3/envs/FairMOT/include/python3.8 -c -c /home/opt/mot/DCNv2/src/cuda/dcn_v2_cuda.cu -o /home/opt/mot/DCNv2/build/temp.linux-x86_64-3.8/home/opt/mot/DCNv2/src/cuda/dcn_v2_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -ccbin g++ -std=c++14
/home/opt/mot/DCNv2/src/cuda/dcn_v2_cuda.cu(107): error: identifier "THCState_getCurrentStream" is undefined
/home/opt/mot/DCNv2/src/cuda/dcn_v2_cuda.cu(279): error: identifier "THCState_getCurrentStream" is undefined
/home/opt/mot/DCNv2/src/cuda/dcn_v2_cuda.cu(324): error: identifier "THCudaBlas_Sgemv" is undefined
3 errors detected in the compilation of "/tmp/tmpxft_0011fd41_00000000-6_dcn_v2_cuda.cpp1.ii".

I use this version https://github.com/lbin/DCNv2, THCState_getCurrentStream" is undefined solved.

@hhcs9527
Copy link

hhcs9527 commented Feb 9, 2021

Is there a solution for compliling this branch for PyTorch = 1.8 and CUDA = 11.1 (from torch.version.cuda)?

@XDynames
Copy link

XDynames commented Feb 9, 2021

@hhcs9527 Not yet, we have a version of deformable convolution - not ROI pooling that does work with those versions but it is currently not working well in multi GPU training (very slow)
You might be able to patch what is here again by working out a suitable ATEN function to replace the depreciated BLAS calls used - we felt like we'd be doing this for ever after literally having some of the functions we used as replacements deprecated in the next version of pytorch (which dropped a day after we submitted this pull request)

@DrakeSkytecn
Copy link

I still had many errors on Windows. Does this work on Windows?
Edit: Windows 10, torch==1.7.1, cuda 11.0
Edit 2: Add errors

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(28): error: calling a __host__ function("__floorf") from a __device__ function("dmcn_im2col_bilinear_cuda") is not allowed

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(28): error: identifier "__floorf" is undefined in device code

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(29): error: calling a __host__ function("__floorf") from a __device__ function("dmcn_im2col_bilinear_cuda") is not allowed

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(29): error: identifier "__floorf" is undefined in device code

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(65): error: calling a __host__ function("__floorf") from a __device__ function("dmcn_get_gradient_weight_cuda") is not allowed

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(65): error: identifier "__floorf" is undefined in device code

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(66): error: calling a __host__ function("__floorf") from a __device__ function("dmcn_get_gradient_weight_cuda") is not allowed

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(66): error: identifier "__floorf" is undefined in device code

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(92): error: calling a __host__ function("__floorf") from a __device__ function("dmcn_get_coordinate_weight_cuda") is not allowed

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(92): error: identifier "__floorf" is undefined in device code

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(93): error: calling a __host__ function("__floorf") from a __device__ function("dmcn_get_coordinate_weight_cuda") is not allowed

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(93): error: identifier "__floorf" is undefined in device code

meet the same error like u
Windows 10, python 3.8, torch 1.7.0, cuda 10.2

@DrakeSkytecn
Copy link

I still had many errors on Windows. Does this work on Windows?
Edit: Windows 10, torch==1.7.1, cuda 11.0
Edit 2: Add errors

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(28): error: calling a __host__ function("__floorf") from a __device__ function("dmcn_im2col_bilinear_cuda") is not allowed

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(28): error: identifier "__floorf" is undefined in device code

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(29): error: calling a __host__ function("__floorf") from a __device__ function("dmcn_im2col_bilinear_cuda") is not allowed

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(29): error: identifier "__floorf" is undefined in device code

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(65): error: calling a __host__ function("__floorf") from a __device__ function("dmcn_get_gradient_weight_cuda") is not allowed

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(65): error: identifier "__floorf" is undefined in device code

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(66): error: calling a __host__ function("__floorf") from a __device__ function("dmcn_get_gradient_weight_cuda") is not allowed

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(66): error: identifier "__floorf" is undefined in device code

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(92): error: calling a __host__ function("__floorf") from a __device__ function("dmcn_get_coordinate_weight_cuda") is not allowed

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(92): error: identifier "__floorf" is undefined in device code

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(93): error: calling a __host__ function("__floorf") from a __device__ function("dmcn_get_coordinate_weight_cuda") is not allowed

DCNv2/DCN/src/cuda/dcn_v2_im2col_cuda.cu(93): error: identifier "__floorf" is undefined in device code

image

Finally build successfully!!!
clone the code from this version https://github.com/lbin/DCNv2/tree/pytorch_1.7
and replace all the floor(...) to floorf(...),
ceil(...) to ceilf(...),
round(...) to roundf(...)

@DrakeSkytecn
Copy link

DrakeSkytecn commented Mar 25, 2021 via email

@DrakeSkytecn
Copy link

DrakeSkytecn commented Mar 25, 2021 via email

@rathaROG
Copy link

upgrade ur vs to 2019

------------------ 原始邮件 ------------------ 发件人: "CharlesShang/DCNv2" @.>; 发送时间: 2021年3月25日(星期四) 晚上7:31 @.>; @.@.>; 主题: Re: [CharlesShang/DCNv2] Pytorch 1.6-1.8 compatability - CUDA11/3090 ready (#92) @rathaROG commented on this pull request. In DCN/src/cpu/dcn_v2_cpu.cpp: > @@ -1,5 +1,6 @@ #include <vector> Hi @haruishi43, I still had one more problem: [4/4] "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\amd64/link.exe" dcn_v2_cuda.o dcn_v2_cpu.o dcn_v2_im2col_cpu.o dcn_v2_psroi_pooling_cpu.o dcn_v2_cuda.cuda.o dcn_v2_im2col_cuda.cuda.o dcn_v2_psroi_pooling_cuda.cuda.o /nologo /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda_cu.lib @.@at@@@.@@._N1@Z torch_cuda_cpp.lib @.@at@@yahxz torch.lib /LIBPATH:C:\dev\exc\Anaconda3\envs\DEFT\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\dev\exc\Anaconda3\envs\DEFT\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib/x64" cudart.lib /out:DCNv2_gpu.pyd FAILED: DCNv2_gpu.pyd "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\amd64/link.exe" dcn_v2_cuda.o dcn_v2_cpu.o dcn_v2_im2col_cpu.o dcn_v2_psroi_pooling_cpu.o dcn_v2_cuda.cuda.o dcn_v2_im2col_cuda.cuda.o dcn_v2_psroi_pooling_cuda.cuda.o /nologo /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda_cu.lib @.@at@@@.@@._N1@Z torch_cuda_cpp.lib @.@at@@yahxz torch.lib /LIBPATH:C:\dev\exc\Anaconda3\envs\DEFT\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\dev\exc\Anaconda3\envs\DEFT\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib/x64" cudart.lib /out:DCNv2_gpu.pyd Creating library DCNv2_gpu.lib and object DCNv2_gpu.exp MSVCRT.lib(loadcfg.obj) : error LNK2001: unresolved external symbol __enclave_config DCNv2_gpu.pyd : fatal error LNK1120: 1 unresolved externals ninja: build stopped: subcommand failed. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Thanks for the clue! I already had the latest version of VS2019, and I realized that I didn't add the path of cl.exe of VS2019 in the system path variable. In case you're interested, I also made a windows-ready repo here:
https://github.com/rathaROG/DCNv2_Windows

@DrakeSkytecn
Copy link

DrakeSkytecn commented Mar 26, 2021 via email

@yawara18 yawara18 mentioned this pull request Apr 8, 2021
@JohnPekl
Copy link

JohnPekl commented Apr 9, 2021

Hi! Matthew,
I have a RTX3090, and cloned your project that you modified 14 hours ago.
While ./make.sh still get the error about:
nvcc fatal : Unsupported gpu architecture 'compute_86'
image

I got a Ubuntu 18.04, CUDA 11.1 pytorch 1.7, and gcc 7.5.0 / g++ 7.5.0

I guess it's probably the CUDA caused error, ANY HELP WOULD BE APPRECIATED!!

I have the same issue and it was fixed by the following steps:
My computer: NVIDIA-SMI 460.39 Driver Version: 460.39 CUDA Version: 11.2, nvcc - V ==> Build cuda_11.0_bu.TC445_37.28540450_0

  1. Install pytorch 1.7.1 py3.8_cuda11.0.221_cudnn8.0.5_0 conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch -c conda-forge
  2. Clone the latest source from DCNv2_latest
  3. Add the following line in setup.py '--gpu-architecture=compute_75','--gpu-code=sm_75'

extra_compile_args["nvcc"] = [ "-DCUDA_HAS_FP16=1", "-D__CUDA_NO_HALF_OPERATORS__", "-D__CUDA_NO_HALF_CONVERSIONS__", "-D__CUDA_NO_HALF2_OPERATORS__", '--gpu-architecture=compute_75','--gpu-code=sm_75' ]

  1. ./make.sh

@haruishi43
Copy link

@JohnPekl have you tried running export TORCH_CUDA_ARCH_LIST='8.0+PTX' before running make.sh? It's only a temporary workaround but it should allow it to compile.

@JohnPekl
Copy link

@JohnPekl have you tried running export TORCH_CUDA_ARCH_LIST='8.0+PTX' before running make.sh? It's only a temporary workaround but it should allow it to compile.

@haruishi43 , I haven't tried running export TORCH_CUDA_ARCH_LIST='8.0+PTX. The four mentioned steps are all that I have done.

@hhd-shuai
Copy link

I have compiled successfully using pytorch1.7. Thanks. @jerryhitit @MatthewHowe

I have downgrade pytorch version to 1.7 stable,but it doesn't work for me.
Do you have any good suggestions?Thank you in advance.
runtimeerror

@bryanbocao
Copy link

I used this [docker image]docker pull nvidia/cuda:11.1-devel-ubuntu18.04 - installed conda then torch-nightly.
I then cloned and compiled DCNv2. This could be an issue with Cuda11.0 or some other conflicting packages.
When DCN doesn't compile usually the error from the cause is above your screen cap - if you run the ./make again the compiled parts will not run and it will make it clearer what is causing the issue.

Hi @MatthewHowe I appreciate your work! Do you have specific commands to compile DCNv2? Thanks!

@Ada1223
Copy link

Ada1223 commented Jul 31, 2021

@Xpangz this is a different error
In your case the g++ linker is failing to find compiled objects it expects to be created by the first build stage
Double check all your versions, if you used the above pip install to get the pytorch binary compiled with cuda 11 it will not be compatible with your install version of cuda 10.2
Your cuda version, cuda toolkit binary and pytorch (what cuda/cudnn it was compiled with) all have to agree for this to build

@XDynames thank you for your reply, I used conda install pytorch environment again, and it get solved.

hello,I met the same error as yours ,could you explain how to use conda install for more details? I recreate the envs and reinstall the pytorch : conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch,
but it still doesn't work .I'm creay

@Dhagash4
Copy link

Dhagash4 commented Oct 27, 2021

My system specs:
Ubuntu;20.04

Screenshot from 2021-10-27 17-20-07
Screenshot from 2021-10-27 17-20-37

NVIDIA GeForce RTX 3060 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.

In my conda environment I
am using pytorch 1.7.0 with cuda 10.2, testcpu.py is giving no error but if I try to run testcuda.py it gives this and then pauses for long time it gives this error

Unable to find a valid cuDNN algorithm to run convolution

And if I am upgrading the pytorch version then its failing to build only any solutions, suggestions much appreciated

Thank you for your time

@GeLink9999
Copy link

seems ok after using https://github.com/tteepe/DCNv2

@Steinwang
Copy link

Your cuda version, cuda toolkit binary and pytorch (what cuda/cudnn it was compiled with) all have to agree for this to build

Hello, Could you please share your version of pytorch cudatoolkit gcc-v nvcc -v information?I'm suffering this problem for command 'g++' failed with exit status 1 and it drives me crazy @ @XDynames

@fkjslee
Copy link

fkjslee commented Dec 27, 2021

can't fix when i downgrade my pytorch to 1.7.0 stable. sad....

@Ada1223
Copy link

Ada1223 commented Feb 15, 2022 via email

@unbeliveyu
Copy link

unbeliveyu commented Sep 1, 2022

pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html
I made the following error according to the above command:
ERROR: torch has an invalid wheel, .dist-info directory not found

@unbeliveyu
Copy link

pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html

the same error

How did you modify it
please help me

@unbeliveyu
Copy link

I successfully compiled on Windows 10, CUDA 11.1 (RTX3090), and PyTorch 1.7. Thank you so much!

你能解决这个问题吗,我在cuda11,pytorch1.7(RTX 3090)编译成功,非常感谢 @MatthewHowe

modulated_deformable_im2col_cuda 中的错误:没有可在设备上执行的内核映像 THCudaCheck FAIL 文件=/opt/conda/conda-bld/pytorch_1607370156314/work/aten/src/THC/THCCachingHostAllocator.cpp 行=278 错误=700:非法内存 在抛出 'std::runtime_error' what() 实例后遇到访问终止调用 :/opt/conda/conda-bld/pytorch_1607370156314/work/torch/lib/c10d/../c10d/NCCLUtils 中的 NCCL 错误。 hpp:136,未处理的 cuda 错误,modulated_deformable_im2col_cuda 中的 NCCL 版本 2.7.8 错误:没有可在设备上执行的内核映像 /opt/conda/lib/python3.7/multiprocessing/semaphore_tracker.py:144:用户警告:semaphore_tracker:关闭时似乎有 33 个泄漏的信号量需要清理 len(cache)) Traceback(最近一次调用最后): 文件“C_ddp.py”,第 349 行,在 main() 文件“C_ddp.py”,第 109 行,在 main mp.spawn(main_worker, nprocs=ngpus_per_node, args =(ngpus_per_node, args)) 文件“/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py”,第 199 行,在 spawn return start_processes(fn, args, nprocs, join, daemon , start_method='spawn') 文件“/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py”,第 157 行,在 start_processes 而不是 context.join(): 文件“/opt /conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py”,第 118 行,加入 引发异常(msg) 异常:

-- 进程 0 因以下错误而终止: Traceback(最近一次调用最后一次): 文件“/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py”,第 19 行,在 _wrap fn (i, *args) 文件“/home/wj/Detection/CenterNetV2/C_ddp.py”,第 277 行,在 main_worker center_loss, center_fuse_loss, scale_loss, offset_loss = model({'img':img, 'label':label, 'heatmap_t':heatmap_t, 'hm_FuseClass_t':hm_FuseClass_t}) 文件“/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py”,第 727 行,在 _call_impl 结果 = self.前向(*输入,**kwargs) 文件“/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py”,第 619 行,前向 输出 = self.模块(*输入[0],**kwargs[0]) 文件“/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py”,第 727 行,在 _call_impl 结果 = self.forward(*input, **kwargs) 文件“/home /wj/Detection/CenterNetV2/nets/resnet_dcn_model.py”,第 35 行,in forward out=self.backbone(x)[0] 文件“/opt/conda/lib/python3.7/site-packages/torch/nn /modules/module.py”,第 727 行,在 _call_impl 结果 = self.forward(*input, **kwargs) 文件“/home/wj/Detection/CenterNetV2/networks/resnet_dcn.py”,第 261 行,向前 x = self.deconv_layers(x) 文件“/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py”,第 727 行,在 _call_impl 结果中 = self.forward(*input, * *夸格斯) 文件“/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py”,第 117 行,前向 输入 = 模块(输入) 文件“/opt/conda/lib/python3. 7/site-packages/torch/nn/modules/module.py”,第 727 行,在 _call_impl 结果 = self.forward(*input, **kwargs) 文件“/opt/conda/lib/python3.7/site- packages/torch/nn/modules/conv.py",第 929 行,在前向 output_padding、self.groups、self.dilation) RuntimeError:cuDNN 错误:CUDNN_STATUS_INTERNAL_ERROR 您可以尝试使用以下代码片段重现此异常。如果这不会触发错误,请在报告此问题时包含您的原始复制脚本。

import torch.backends.cuda.matmul.allow_tf32 = True torch.backends.cudnn.benchmark = True torch.backends.cudnn.deterministic = True torch.backends.cudnn.allow_tf32 = True data = torch.randn([32, 256 , 40, 40], dtype=torch.float, device='cuda', requires_grad=True) net = torch.nn.Conv2d(256, 256, kernel_size=[4, 4], padding=[1, 1], stride=[2, 2], dilation=[1, 1], groups=1) net = net.cuda().float() out = net(data) out.backward(torch.randn_like(out)) 火炬。 cuda.synchronize()

ConvolutionParams data_type = CUDNN_DATA_FLOAT padding = [1, 1, 0] stride = [2, 2, 0] dilation = [1, 1, 0] groups = 1 确定性 = true allow_tf32 = true 输入:TensorDescriptor 0x55923cf1b4e0 type = CUDNN_DATA_FLOAT nbDims = 4 dimA = 32, 256, 40, 40, strideA = 409600, 1600, 40, 1, 输出:TensorDescriptor 0x55923cf1d1b0 type = CUDNN_DATA_FLOAT nbDims = 4 dimA = 32, 256, 20, 20, strideA = 102400, 400, 20, 1 权重:FilterDescriptor 0x55923cf4cec0 type = CUDNN_DATA_FLOAT tensor_format = CUDNN_TENSOR_NCHW nbDims = 4 dimA = 256, 256, 4, 4, 指针地址: 输入:0x7fcf70a80000 输出:0x7fcf6fe00000 权重:0x7fd1d1700000

How did you solve this problem?
After I successfully compiled dcnv2, this problem also occurred when compiling the whole program. Please help me
大佬救救我

@unbeliveyu
Copy link

我在 Windows 10、CUDA 11.1 (RTX3090) 和 PyTorch 1.7 上成功编译。太感谢了!

Excuse me, how did you compile it successfully? Did you limit the computing power of the graphics card to 75? Can you help me?

@3846chs
Copy link

3846chs commented Nov 8, 2022

I forked from https://github.com/MatthewHowe/DCNv2 which fixes to be compatible with [torch 1.7 / 1.8 with cuda 10 / 11]
This folder structure is slightly different from the original, which can cause errors in several projects using DCNv2.
So, I made the folder structure the same as the original: https://github.com/3846chs/DCNv2.git

My environment:
torch 1.7.1+cu110
cuda 11
RTX 3090

@yellowjs0304
Copy link

yellowjs0304 commented Nov 22, 2022

Hi, I met same error(compute_86) "torch 1.7 / 1.8 with cuda 10 / 11" .
Did you used Anaconda environment? @3846chs

+) Fix
I'm using the Anaconda environment (torch 1.9.1+cu111/ cuda 11.2 / GPU 3080).
I don't know why this is fixed but, I re-installed cuda by official cuda install guide. Also, cudnn followed official way (unzip cudnn.tar and copied the cudnn*.h or cudnn.so.* files to /usr/local/cuda-11.2/lib64 or include/) . And updated ~/.bashrc file as new environment variable PATH.

In my case, i don't used conda install cudatoolkit, the issue solved.

@junmuzi
Copy link

junmuzi commented May 26, 2023

Hi, can someone help me to see why I get the following error here?

error: command '/usr/local/cuda-11.0/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/bin/nvcc' failed: No such file or directory: '/usr/local/cuda-11.0/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/bin/nvcc'

The full error message is as follows:
$ ./make.sh
...
/home/sda/lijun/Moving-object-detection-DSFNet/lib/models/DCNv2-master/DCN/src/cpu/dcn_v2_psroi_pooling_cpu.cpp:398:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES’ 398 | AT_DISPATCH_FLOATING_TYPES(out_grad.type(), "dcn_v2_psroi_pooling_cpu_backward", [&] { | ^~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /home/junmuzi/anaconda3/envs/mod6/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3, from /home/junmuzi/anaconda3/envs/mod6/lib/python3.7/site-packages/torch/include/ATen/Context.h:4, from /home/junmuzi/anaconda3/envs/mod6/lib/python3.7/site-packages/torch/include/ATen/ATen.h:9, from /home/sda/lijun/Moving-object-detection-DSFNet/lib/models/DCNv2-master/DCN/src/cpu/dcn_v2_psroi_pooling_cpu.cpp:15:
/home/junmuzi/anaconda3/envs/mod6/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:363:7: note: declared here 363 | T * data() const {
| ^~~~
/usr/local/cuda-11.0/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/bin/nvcc -DWITH_CUDA -I/home/sda/lijun/Moving-object-detection-DSFNet/lib/models/DCNv2-master/DCN/src -I/home/junmuzi/anaconda3/envs/mod6/lib/python3.7/site-packages/torch/include -I/home/junmuzi/anaconda3/envs/mod6/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/junmuzi/anaconda3/envs/mod6/lib/python3.7/site-packages/torch/include/TH -I/home/junmuzi/anaconda3/envs/mod6/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.0/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/include -I/home/junmuzi/anaconda3/envs/mod6/include/python3.7m -c /home/sda/lijun/Moving-object-detection-DSFNet/lib/models/DCNv2-master/DCN/src/cuda/dcn_v2_cuda.cu -o build/temp.linux-x86_64-cpython-37/home/sda/lijun/Moving-object-detection-DSFNet/lib/models/DCNv2-master/DCN/src/cuda/dcn_v2_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin g++ -std=c++14
error: command '/usr/local/cuda-11.0/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/bin/nvcc' failed: No such file or directory: '/usr/local/cuda-11.0/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/usr/local/cuda-11.7/:/bin/nvcc'

@Ada1223
Copy link

Ada1223 commented May 26, 2023 via email

@QingZhuanya
Copy link

I forked from https://github.com/MatthewHowe/DCNv2 which fixes to be compatible with [torch 1.7 / 1.8 with cuda 10 / 11] This folder structure is slightly different from the original, which can cause errors in several projects using DCNv2. So, I made the folder structure the same as the original: https://github.com/3846chs/DCNv2.git

My environment: torch 1.7.1+cu110 cuda 11 RTX 3090

3Q, wonderful work

@Ada1223
Copy link

Ada1223 commented Sep 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.