OutOfMemoryError #41

1048846280 · 2024-07-18T14:39:43Z

I have 12 GB GPU but I get this error. I came across this problem during training. Initially, the training was fine, but after 1000 steps, this error occurred.

And the sample rate is 16KHZ.

Steps : 985, Gen Loss: 1.028, Disc Loss: 0.007, Metric loss: 0.649, Magnitude Loss : 0.110, Phase Loss : 2.710, Complex Loss : 0.293, Time Loss : 0.123, s/b : 0.213
Steps : 990, Gen Loss: 0.493, Disc Loss: 0.002, Metric loss: 0.168, Magnitude Loss : 0.025, Phase Loss : 1.417, Complex Loss : 0.084, Time Loss : 0.097, s/b : 0.213
Steps : 995, Gen Loss: 0.779, Disc Loss: 0.001, Metric loss: 0.283, Magnitude Loss : 0.046, Phase Loss : 2.181, Complex Loss : 0.200, Time Loss : 0.146, s/b : 0.232
Steps : 1000, Gen Loss: 1.113, Disc Loss: 0.003, Metric loss: 0.666, Magnitude Loss : 0.134, Phase Loss : 2.843, Complex Loss : 0.368, Time Loss : 0.164, s/b : 0.206
Traceback (most recent call last):
File "/media/MP-SENetmain/train.py", line 309, in
main()
File "/media/MP-SENetmain/train.py", line 305, in main
train(0, a, h)
File "/media/MP-SENetmain/train.py", line 233, in train
mag_g, pha_g, com_g = generator(noisy_mag.to(device), noisy_pha.to(device))
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/MP-SENetmain/models/generator.py", line 139, in forward
x = self.TSConformeri
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/MP-SENetmain/models/generator.py", line 113, in forward
x = self.freq_conformer(x) + x
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/MP-SENetmain/models/conformer.py", line 73, in forward
x = x + self.ccm(x)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/MP-SENetmain/models/conformer.py", line 43, in forward
return self.ccm(x)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 263, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 260, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 7.17 GiB (GPU 0; 10.75 GiB total capacity; 150.87 MiB already allocated; 7.19 GiB free; 1.53 GiB reserved in total by PyTorch)

vkeep · 2024-07-24T08:05:51Z

I also meet this problem during validation .
my gpu is 24G, and when training, I reduce the bs = 2 , segment_size =24000, also similar OOM problem

vkeep · 2024-07-24T08:14:17Z

this problem occur when validation , you can modify validset = Dataset(validation_indexes.... in train.py

split = True, then the validation data will cut by segment_size, and the OOM problem solved

1048846280 · 2024-08-03T11:53:06Z

this problem occur when validation , you can modify validset = Dataset(validation_indexes.... in train.py

split = True, then the validation data will cut by segment_size, and the OOM problem solved

Thanks a million! this works now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OutOfMemoryError #41

OutOfMemoryError #41

1048846280 commented Jul 18, 2024

vkeep commented Jul 24, 2024

vkeep commented Jul 24, 2024

1048846280 commented Aug 3, 2024

OutOfMemoryError #41

OutOfMemoryError #41

Comments

1048846280 commented Jul 18, 2024

vkeep commented Jul 24, 2024

vkeep commented Jul 24, 2024

1048846280 commented Aug 3, 2024