nvdia-smi doesn't seem to show the actual GPU load when training a neural net?

by Ivan Novikov   Last Updated September 11, 2019 16:02 PM

I have been training a neural network for the last two days, and occasionally monitoring GPU usage with nvidia-smi, which gives the following output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40       Driver Version: 430.40       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:01:00.0 Off |                  N/A |
| 34%   44C    P8    12W / 200W |    208MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1298      G   /usr/lib/xorg/Xorg                            22MiB |
|    0      1335      G   /usr/bin/sddm-greeter                         48MiB |
|    0      2904      G   /usr/lib/xorg/Xorg                            10MiB |
|    0     22777      G   /usr/lib/xorg/Xorg                            10MiB |
|    0     27625      C   python                                       103MiB |
+-----------------------------------------------------------------------------+

Weird that it uses so little, but OK I thought.

I then started working on a different neural net, which also required some training, so I ran the training script, and received the following error:

RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 7.93 GiB total capacity; 7.09 GiB already allocated; 66.00 MiB free; 17.45 MiB cached)

Which seems to indicate that all of the GPU memory is used up, which makes sense, but why is this not reflected when running nvidia-smi?



Related Questions


Updated September 22, 2017 23:02 PM

Updated June 04, 2017 01:02 AM

Updated July 11, 2017 06:02 AM

Updated July 06, 2016 09:00 AM

Updated November 24, 2018 16:02 PM