run yolov3 with GPU:CUDA Error: out of memory #791

zsjerongdu · 2018-05-17T03:40:21Z

Hi,
I maked the darknet with "GPU=1,CUDNN=1,OPENCV=1" successfully，however，when I use the command "sudo ./darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg
",it shows:
CUDA Error: out of memory
darknet: ./src/cuda.c:36: check_error: Assertion `0' failed.
But if I use the command"sudo ./darknet detector test cfg/coco.data cfg/yolov2.cfg yolov2.weights data/dog.jpg",it can detect targets successfully.
It seems that the problem is the yolov3.What can I do to solve the problem?

dayn9t · 2018-05-17T09:47:13Z

hjchai · 2018-05-18T05:12:35Z

@dayn9t I think your gpu is low in memory. when Yolov3 fully loaded to gpu, it takes about 1600MB memory by default setting(416*416) on my computer, plus 300ish MiB from display and other applications, it is very like it will throw out OOM error. Try to run on a gpu with larger memory or reduce the width and height setting in your cfg file(Note: reducing the size might impact your detection results.).

zsjerongdu · 2018-05-22T03:31:43Z

Try to reboot and it may help.

dayn9t · 2018-06-04T05:09:00Z

thanks @hjchai .
I found the location of the error: darknet/src/maxpool_layer.c:46, then I found the variable 'batch' is 128.
I change the batch to 1 in cfg file, so it worked.

arya-coding · 2018-09-16T10:09:35Z

I had the same problem with a GT740M with 4096Mo GDDR4 memory. Nvidia 384.130, Cuda 9, CUDNN, OpenCV 3.3.

My solution to run Yolov3 perfectly was to : modify the cfg/yolov3.cfg :

batch=1
subdivisions=1
width=416
height=416

cesarhcq · 2018-09-23T23:02:14Z

I had the same problem with a GT740M with 4096 Memory. Nvidia 384.130, Cuda 9, CUDNN, OpenCV 3.3.

My solution to run Yolov3 perfectly was to : modify the cfg/yolov3.cfg :
batch=1
subdivisions=1
width=416
height=416

That's worked for me!

Thank you very much!

idpdka · 2018-11-01T10:29:28Z

I had the same problem with a GT740M with 4096 Memory. Nvidia 384.130, Cuda 9, CUDNN, OpenCV 3.3.

My solution to run Yolov3 perfectly was to : modify the cfg/yolov3.cfg :
batch=1
subdivisions=1
width=416
height=416

Your solution works on me, thank you very much!!!!

Aurora11111 · 2018-11-14T02:53:32Z

does the weight and height inference the result?

cesarhcq · 2018-11-15T21:32:44Z

does the weight and height inference the result?

That's a good question, i guess not!

SonaHarutyunyan · 2018-12-11T08:42:34Z

Here https://github.com/AlexeyAB/darknet you can find this note:
Note: if error Out of memory occurs then in .cfg-file you should increase subdivisions=16, 32 or 64

amandazw · 2018-12-19T07:48:49Z

I had the same problem with a GT740M with 4096 Memory. Nvidia 384.130, Cuda 9, CUDNN, OpenCV 3.3.

My solution to run Yolov3 perfectly was to : modify the cfg/yolov3.cfg :
batch=1
subdivisions=1
width=416
height=416

It works！thank you so much ！

rvjenya · 2018-12-26T19:56:49Z

I had the same problem with a GT740M with 4096Mo GDDR4 memory. Nvidia 384.130, Cuda 9, CUDNN, OpenCV 3.3.

My solution to run Yolov3 perfectly was to : modify the cfg/yolov3.cfg :

batch=1
subdivisions=1
width=416
height=416

Thanks. works))

dav-ell · 2019-01-15T15:14:16Z

Another solution that worked for me was to use one of the alternate config files: yolov3-tiny.cfg.

You'll notice that @aryus96 options is what is used in that file as well:


[net]
...
batch=1
subdivisions=1
...
width=416
height=416

So instead of using the command

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

I use

./darknet detect cfg/yolov3-tiny.cfg yolov3.weights data/dog.jpg

I can also use yolov3-openimages.cfg, yolov3-spp.cfg, and yolov3-voc.cfg without errors.

Flock1 · 2019-03-13T05:23:27Z

I had the same problem with a GT740M with 4096Mo GDDR4 memory. Nvidia 384.130, Cuda 9, CUDNN, OpenCV 3.3.

My solution to run Yolov3 perfectly was to : modify the cfg/yolov3.cfg :

batch=1
subdivisions=1
width=416
height=416

Hi,
I'm using batch as 64 and subdivisions as 2. It still runs out of memory.

zenogantner · 2019-03-21T13:23:30Z

To not get an OOM during prediction, I had to set

batch=16
subdivisions=16
width=608
height=608

on my Quadro M1200 with 4GB of RAM.

abdou31 · 2019-03-22T22:42:57Z

I had the same problem with a GT740M with 4096Mo GDDR4 memory. Nvidia 384.130, Cuda 9, CUDNN, OpenCV 3.3.

My solution to run Yolov3 perfectly was to : modify the cfg/yolov3.cfg :

batch=1
subdivisions=1
width=416
height=416

Not working , it show :
Error: You set incorrect value batch=1 for Training! You should set batch=64 subdivision=64

zenogantner · 2019-03-24T15:16:58Z

@abdou31 I think most of the comments inthis issue are about inference, not training.

oshadaamila · 2019-03-30T13:58:30Z

I had the same problem with a GT740M with 4096Mo GDDR4 memory. Nvidia 384.130, Cuda 9, CUDNN, OpenCV 3.3.

My solution to run Yolov3 perfectly was to : modify the cfg/yolov3.cfg :

batch=1
subdivisions=1
width=416
height=416

this worked in my nvidia GT940M

abdou31 · 2019-03-30T14:46:06Z

@zenogantner but what about training?

dhilip1 · 2019-03-31T09:38:31Z

I had the same problem with a GT740M with 4096Mo GDDR4 memory. Nvidia 384.130, Cuda 9, CUDNN, OpenCV 3.3.

My solution to run Yolov3 perfectly was to : modify the cfg/yolov3.cfg :

batch=1
subdivisions=1
width=416
height=416

It worked!! finally i am able to train my dataset!!! i thought of nvidia drivers installed already..in my laptop by default..!!

ingmarsell · 2019-04-27T15:30:00Z

My solution to run Yolov3 perfectly was to : modify the cfg/yolov3.cfg :

batch=1
subdivisions=1
width=416
height=416

When uncommenting batch=1 and subdivisions=1 i got it working with a detection time of 2.9 seconds.

Here https://github.com/AlexeyAB/darknet you can find this note:
Note: if error Out of memory occurs then in .cfg-file you should increase subdivisions=16, 32 or 64

But by undoing the changes and just increasing the subdivisions from 16 to 32 i got a detection time of 0.2 seconds

Running RTX 2060 with nvidia-418 and cuda 10.1

Edit:
width=608
height=608
was used throughout

ben423423n32j14e · 2019-05-01T04:43:56Z

I'm using a Nvidia GT 1030 (2gb memory), getting a prediction time of 0.162808 seconds with these settings:

batch=32
subdivisions=32
width=416
height=416

I have not found a settings combination to run width and height of 608, I see this error:

74 res 71 19 x 19 x1024 -> 19 x 19 x1024
CUDA Error: out of memory

Looking at nvidia-smi, it seems like it "only just" runs out of memory trying with 608, if there was an extra 500mb memory on the card I suspect it would work. :(

twodoge · 2019-06-01T14:04:51Z

Try to reboot and it may help.

I did what he said, and then it succeeded.

pirate-lofy · 2019-08-02T17:55:33Z

thanks @hjchai .
I found the location of the error: darknet/src/maxpool_layer.c:46, then I found the variable 'batch' is 128.
I change the batch to 1 in cfg file, so it worked.

I found the variable batch in the file cfg/yolov3.cfg with value 40, when I changed to 1 it produced an error saing "0 cuda nalloc failed".
but when I changed to 31 or less it worked.
I think this number will vary from a pc to another.

AbhimanyuAryan · 2019-08-06T12:11:47Z

I was using 2080Ti & It showed me this error. So I went through some stackoverflow links. Remove Nvidia's outdated driver(if you followed some blog post to install driver and cuda)

Download latest drivers manually from official website. You are good to go....error fixed 🎉🎉🎉

antonmilev · 2019-08-14T14:23:08Z

hello. I have latest Nvidia drivers and GTX 1050, nvidia-smi shows that I have 4GB GPU memory:
2019/08/14 17:19:27.728, 2400 MiB, 4096 MiB
But why darknet shows CUDA out of memory for:
batch=32
subdivisions=16
after it reaches 3488 MB? The most it is able to allocate is 2400MB, so I am forced to use:
batch=32
subdivisions=32
I wonder is it NVIDIA bug and having actually less than 4 GB usable GPU memory?

intelltech · 2019-08-20T18:49:11Z

Tengo una GT 740M de 2GB y funciono con yoloV3-tiny, tanto en la configuracion como los pesos.

barzan-hayati · 2019-10-14T09:10:22Z

Thanks @aryus96 .

I have this problem in another way. I explained my problem in Darknet docker image doesnt work after shipping to another system #4082."

First Machine: RTX 2080 Ti with 11G memory
Second Machine: Geforce 1050 Ti with 4G memory

I have downloaded a docker image with this specs:cuda9.2.148 cudnn7.5.0 opencv3.4.6 .

Then I pulled darknet and make it.I could successfully train and test tiny-yolov2 and tiny-yolov3 in docker. then I commit this docker container, saved it to a .tar file and moved it to another system. Now I want to test darknet in new system via loading docker image.

When I want to test darknet in new machine, by testing tiny-yolov2 it could not detect any objects and by testing tiny-yolov3 it failed by CUDA Error: out of memory. I think second problem rises for lower GPU memory in second machine. How should I resolve first problem?

As you mentioned, I guess second problem rises from lower GPU memory. Are you have any idea about my first problem(detection in one but could not detect in another)?

Monkey1GIt · 2020-02-24T08:19:51Z

I Training YOLO on VOC according to the introduction from https://pjreddie.com/darknet/yolo/#train-voc
./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74
even through I modify the cfg like:

batch=1
subdivisions=1
width=416
height=416

I also have a mistake Error: You set incorrect value batch=1 for Training! You should set batch=64 subdivision=64 like @abdou31 meet.
But when I modify the cfg to:

batch=1
subdivisions=16
width=320
height=320

Both error has disappered.
FYI: like https://github.com/pjreddie/darknet/issues/791#issuecomment-390096490 said it may impact detection precision
https://github.com/AlexeyAB/darknet#how-to-improve-object-detection also has mentioned it:

increase network resolution in your .cfg-file (height=608, width=608 or any value multiple of 32) - it will increase precision

XunshanMan · 2020-04-25T12:22:18Z

Hi, I used a GPU Nvidia 840M with 2GB memory, and met the same problem.
It's funny after I closed VS code, which took up around 300Mb memory, and started again, it worked. It seemed yolo need 1.3Gb memory for me.

I also tried increasing the batch and subdivision( they need to be the same, or there would be problem) in the .cfg file, it didn't work, even turned them up to 1024.

DimasVeliz · 2020-04-28T21:07:11Z

I had the same problem with a GT740M with 4096Mo GDDR4 memory. Nvidia 384.130, Cuda 9, CUDNN, OpenCV 3.3.

My solution to run Yolov3 perfectly was to : modify the cfg/yolov3.cfg :

batch=1
subdivisions=1
width=416
height=416

Beers and cheers for this guy! It works!

nisarggandhewar · 2020-05-19T17:31:12Z

How to reduce darknet training time. any cloud based service can we use, so that we can get more powerful GPU and training time can reduce.
Can we configure darknet on Google Colab.

prh-t · 2020-05-25T01:40:48Z

I also have a mistake Error: You set incorrect value batch=1 for Training! You should set batch=64 subdivision=64 like @abdou31 meet.
But when I modify the cfg to:

batch=1
subdivisions=16
width=320
height=320

Both error has disappered.
FYI: like https://github.com/pjreddie/darknet/issues/791#issuecomment-390096490 said it may impact detection precision
https://github.com/AlexeyAB/darknet#how-to-improve-object-detection also has mentioned it:

increase network resolution in your .cfg-file (height=608, width=608 or any value multiple of 32) - it will increase precision

Just a hypothesis: having subdivisions / batch == 16 resolves the out of memory issue... somehow...
I set mine to be batch=4, subdivisions=64, height,width = 608, 608, and it's running ok so far, fingers crossed...

prh-t · 2020-05-25T01:44:18Z

Try to reboot and it may help.

This might work for some as (possibly) your previous failed runs are still occupying some memory, a similar approach would be to kill all python processes.

dhgokul · 2020-09-01T08:00:17Z

Tried reboot device and modify yolo config file. still i am getting issue. Any help appreciated !

batch=1
subdivisions=16
width=320
height=320

batch=1
subdivisions=1
width=416
height=416

Error log :
GPU mode with 1.0 usage
2020-09-01 00:55:35.752924: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2020-09-01 00:55:35.755909: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x16b81140 executing computations on platform Host. Devices:
2020-09-01 00:55:35.756060: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): ,
2020-09-01 00:55:35.934491: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2020-09-01 00:55:35.934823: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x16b500b0 executing computations on platform CUDA. Devices:
2020-09-01 00:55:35.934900: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2020-09-01 00:55:35.936965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
totalMemory: 3.87GiB freeMemory: 2.20GiB
2020-09-01 00:55:35.937042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-09-01 00:55:41.343127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-01 00:55:41.343197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-09-01 00:55:41.343228: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-09-01 00:55:41.343471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3964 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2020-09-01 00:55:58.584739: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 3.87G (4156932096 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
Killed

pankaja0285 · 2021-12-30T16:57:32Z

I have the following:
NOTE: My laptop is a Windows 10 ( DELL AMD RYZEN)

darknet with "GPU=1,CUDNN=1,OPENCV=1" in its Makefile (I use cmake tool for windows and build the solution in VS 2019 to generate darknet.exe
I have a NVIDIA GEFORCE RTX 3060 for which according to this link
I need to use 8.1 which means in the Makefile
I have set the arch as
ARCH= -gencode arch=compute_86,code=[sm_86,compute_86]
Then in my yolov4-custom.cfg I have set
[net]
batch=32
subdivisions=16
width=256
height=256

...
classes=20
...
filters=75
....
among other setting changes...
In my case number of classes I am training are 20, hence setting classes=20 and accordingly the filters I have set to 75
as the math goes as follows (number of classes + 5) * 3 (as per this link)
But for loading cuDNN - convolutional layer into GPU I found this combination works...

In fact, after struggling for 2 days, just came up with this combination of params in the yolov4-custom.cfg and started to train about an hour ago. Keeping my fingers crossed, but definitely the hurdle of unable to load Forward Convultional layer or the infamous Error: cuDNN isn't found FWD algo for convolution is not appearing anymore. Its running past all that and training and I do see a set of weights saved.

run yolov3 with GPU:CUDA Error: out of memory #791

run yolov3 with GPU:CUDA Error: out of memory #791

Comments

zsjerongdu commented May 17, 2018

dayn9t commented May 17, 2018

hjchai commented May 18, 2018

zsjerongdu commented May 22, 2018

dayn9t commented Jun 4, 2018

arya-coding commented Sep 16, 2018 • edited Loading

cesarhcq commented Sep 23, 2018

idpdka commented Nov 1, 2018

Aurora11111 commented Nov 14, 2018

cesarhcq commented Nov 15, 2018

SonaHarutyunyan commented Dec 11, 2018

amandazw commented Dec 19, 2018

rvjenya commented Dec 26, 2018

dav-ell commented Jan 15, 2019

Flock1 commented Mar 13, 2019

zenogantner commented Mar 21, 2019

abdou31 commented Mar 22, 2019

zenogantner commented Mar 24, 2019

oshadaamila commented Mar 30, 2019

abdou31 commented Mar 30, 2019

dhilip1 commented Mar 31, 2019

ingmarsell commented Apr 27, 2019 • edited Loading

ben423423n32j14e commented May 1, 2019 • edited Loading

twodoge commented Jun 1, 2019

pirate-lofy commented Aug 2, 2019

AbhimanyuAryan commented Aug 6, 2019 • edited Loading

antonmilev commented Aug 14, 2019 • edited Loading

intelltech commented Aug 20, 2019

barzan-hayati commented Oct 14, 2019

Monkey1GIt commented Feb 24, 2020

XunshanMan commented Apr 25, 2020 • edited Loading

DimasVeliz commented Apr 28, 2020

nisarggandhewar commented May 19, 2020

prh-t commented May 25, 2020

prh-t commented May 25, 2020

dhgokul commented Sep 1, 2020

pankaja0285 commented Dec 30, 2021 • edited Loading

arya-coding commented Sep 16, 2018 •

edited

Loading

ingmarsell commented Apr 27, 2019 •

edited

Loading

ben423423n32j14e commented May 1, 2019 •

edited

Loading

AbhimanyuAryan commented Aug 6, 2019 •

edited

Loading

antonmilev commented Aug 14, 2019 •

edited

Loading

XunshanMan commented Apr 25, 2020 •

edited

Loading

pankaja0285 commented Dec 30, 2021 •

edited

Loading