Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eval.py 报错:RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match. #363

Open
stealth0414 opened this issue Mar 17, 2023 · 8 comments

Comments

@stealth0414
Copy link

用自己的数据集训练完成后尝试运行eval.py,发现报错
Traceback (most recent call last):
File "eval.py", line 193, in
main()
File "eval.py", line 79, in main
Eval(experiment, experiment_args, cmd=args, verbose=args['verbose']).eval(args['visualize'])
File "eval.py", line 164, in eval
model = self.init_model()
File "eval.py", line 107, in init_model
model = self.structure.builder.build(self.device)
File "/hy-tmp/DB-yanhua/structure/builder.py", line 24, in build
model = Model(self.model_args, device,
File "/hy-tmp/DB-yanhua/structure/model.py", line 37, in init
self.model = BasicModel(args)
File "/hy-tmp/DB-yanhua/structure/model.py", line 15, in init
self.backbone = getattr(backbones, args['backbone'])(**args.get('backbone_args', {}))
File "/hy-tmp/DB-yanhua/backbones/resnet.py", line 310, in deformable_resnet50
model.load_state_dict(model_zoo.load_url(
File "/usr/local/lib/python3.8/dist-packages/torch/hub.py", line 731, in load_state_dict_from_url
return torch.load(cached_file, map_location=map_location)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 905, in legacy_load
return legacy_load(f)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 841, in legacy_load
tensor = torch.tensor([], dtype=storage.dtype).set
(
RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match.

def deformable_resnet50(pretrained=True, **kwargs):
"""Constructs a ResNet-50 model with deformable conv.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(Bottleneck, [3, 4, 6, 3],
dcn=dict(modulated=True,
deformable_groups=1,
fallback_on_stride=False),
stage_with_dcn=[False, True, True, True],
**kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(
model_urls['resnet50']), strict=False)
return model

Why do I still need to load the resnet50 pre-training weight after training? Do you have friends who have solved it?
Also try to comment out if pretrained, the metrics are all 0,
[INFO] [2023-03-17 18:04:29,041] precision : 0.000000 (44)
[INFO] [2023-03-17 18:04:29,042] recall : 0.000000 (44)
[INFO] [2023-03-17 18:04:29,042] fmeasure : 0.000000 (1)
thanks

@YunlongGa
Copy link

我遇到了同样的问题,请问您解决了吗

@stealth0414
Copy link
Author

忘记了,但你可以试试在作者的预训练模型上进行训练

@SairaiL
Copy link

SairaiL commented Apr 17, 2023

可能是pytorch的问题,我在linux上11.3版本也出现这个问题,但是自己的电脑10.2就没问题

@YunlongGa
Copy link

好的,谢谢您了

@MxxM-max
Copy link

MxxM-max commented Aug 3, 2024

请问您解决了吗,我也遇到一样的问题

@Realtyxxx
Copy link

@Obezyan0941
Copy link

I have experienced this issue and this is how I resolved it:
The issue traces back to a script resnet.py to a line 46. During training and validation I have changed the line to:
pretrained_dict = model_zoo.load_url(url)
But it does not work for eval. During eval I change the line to:
pretrained_dict = model_zoo.load_url(url, map_location=device)
Have no idea how to solve it completley but it works fine for now.
Hope it helps!

@FutureZQ
Copy link

    def init_torch_tensor(self):
        # Use gpu or not
        torch.set_default_tensor_type('torch.FloatTensor')
        if torch.cuda.is_available():
            self.device = torch.device('cuda')
            **# torch.set_default_tensor_type('torch.cuda.FloatTensor')**
        else:
            self.device = torch.device('cpu')
        print(self.device)

I delete this line,it works for eval and train. It seem that the default tensor was modify by cuda type.
My env is "torch2.4+cu11.8". It work well with the trained model totaltext_dbnet18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants