Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemoryError #41

Open
1048846280 opened this issue Jul 18, 2024 · 3 comments
Open

OutOfMemoryError #41

1048846280 opened this issue Jul 18, 2024 · 3 comments

Comments

@1048846280
Copy link

I have 12 GB GPU but I get this error. I came across this problem during training. Initially, the training was fine, but after 1000 steps, this error occurred.

And the sample rate is 16KHZ.

Steps : 985, Gen Loss: 1.028, Disc Loss: 0.007, Metric loss: 0.649, Magnitude Loss : 0.110, Phase Loss : 2.710, Complex Loss : 0.293, Time Loss : 0.123, s/b : 0.213
Steps : 990, Gen Loss: 0.493, Disc Loss: 0.002, Metric loss: 0.168, Magnitude Loss : 0.025, Phase Loss : 1.417, Complex Loss : 0.084, Time Loss : 0.097, s/b : 0.213
Steps : 995, Gen Loss: 0.779, Disc Loss: 0.001, Metric loss: 0.283, Magnitude Loss : 0.046, Phase Loss : 2.181, Complex Loss : 0.200, Time Loss : 0.146, s/b : 0.232
Steps : 1000, Gen Loss: 1.113, Disc Loss: 0.003, Metric loss: 0.666, Magnitude Loss : 0.134, Phase Loss : 2.843, Complex Loss : 0.368, Time Loss : 0.164, s/b : 0.206
Traceback (most recent call last):
File "/media/MP-SENetmain/train.py", line 309, in
main()
File "/media/MP-SENetmain/train.py", line 305, in main
train(0, a, h)
File "/media/MP-SENetmain/train.py", line 233, in train
mag_g, pha_g, com_g = generator(noisy_mag.to(device), noisy_pha.to(device))
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/MP-SENetmain/models/generator.py", line 139, in forward
x = self.TSConformeri
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/MP-SENetmain/models/generator.py", line 113, in forward
x = self.freq_conformer(x) + x
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/MP-SENetmain/models/conformer.py", line 73, in forward
x = x + self.ccm(x)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/MP-SENetmain/models/conformer.py", line 43, in forward
return self.ccm(x)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 263, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/anaconda3/envs/MP-SENetmain/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 260, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 7.17 GiB (GPU 0; 10.75 GiB total capacity; 150.87 MiB already allocated; 7.19 GiB free; 1.53 GiB reserved in total by PyTorch)

@vkeep
Copy link

vkeep commented Jul 24, 2024

I also meet this problem during validation .
my gpu is 24G, and when training, I reduce the bs = 2 , segment_size =24000, also similar OOM problem

@vkeep
Copy link

vkeep commented Jul 24, 2024

this problem occur when validation , you can modify validset = Dataset(validation_indexes.... in train.py

split = True, then the validation data will cut by segment_size, and the OOM problem solved

@1048846280
Copy link
Author

this problem occur when validation , you can modify validset = Dataset(validation_indexes.... in train.py

split = True, then the validation data will cut by segment_size, and the OOM problem solved

Thanks a million! this works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants