Skip to content

A simple example of how to parallelize MNIST training using PyTorch.

Notifications You must be signed in to change notification settings

QinLab/ParallelMNIST-Schwab

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

ParallelMNIST

A simple example of how to parallelize MNIST training using PyTorch.

First create a new conda environment, then install the following packages to your environment:

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
conda install matplotlib
conda install -c conda-forge torchinfo

I'll take this tutorial and add parallel training to it. https://medium.com/@nutanbhogendrasharma/pytorch-convolutional-neural-network-with-mnist-dataset-4e8a4265e118

Steps for parallelizing the code (running on multiple GPUs):

  1. In your train loop, you must move the data to the GPU:
    img = img.to(device)
    where device should be "cuda"
  2. After you instantiate your model, move it to the GPU:
    cnn = torch.nn.DataParalle(cnn).cuda()
  3. You can optionally set:
    cudnn.benchmark=True
    - this may give a performance boost

If everything works correctly, while your model is training, you can run

nvidia-smi
in a second terminal window. You should see something similar to this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:19:00.0 Off |                  N/A |
| 18%   35C    P2    60W / 250W |    929MiB / 11019MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:1A:00.0 Off |                  N/A |
| 18%   39C    P2    64W / 250W |    927MiB / 11019MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:67:00.0 Off |                  N/A |
| 18%   42C    P2    67W / 250W |    927MiB / 11019MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:68:00.0 Off |                  N/A |
| 18%   43C    P2    45W / 250W |    990MiB / 11016MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

We can see every GPU has some memory allocated (almost 1GB per GPU in this example). You can increase the batch size and you should see the Memory-Usage on each GPU go up.

If you ever see an error like:

 RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 
This error indicates that some of your data is on the CPU and some is on the GPU. I made this mistake making the tutorial when I forgot to move my TEST set to the GPU but I did all my training on the GPU.

About

A simple example of how to parallelize MNIST training using PyTorch.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%