Skip to content

notnitsuj/DistributedPyTorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed Deep Learning with PyTorch

This repo contains the source code for the course project of Parallel Computing. The main purpose of this repo is to train UNet using different distributed strategies from PyTorch (DataParallel, DistributedDataParallel, and Pipeline).

Set up environment and install packages

conda create -n unetdist
conda activate unetdist
git clone https://github.com/notnitsuj/DistributedPyTorch.git
cd DistributedPyTorch
bash install.sh

Download the data and unzip into the data folder.

Train

To train normally with one GPU, run

python3 train.py

To train using 2 GPUs with DataParallel, run

python3 train.py -t DP

To train using 2 GPUs with DistributedDataParallel, run

torchrun --standalone --nnodes=1 --nproc_per_node=2 train.py -t DDP -b 2

To train using 2 GPUs with Pipeline model parallelism, run

python3 train.py -t MP

About

Parallel Computing project - Sem 212

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published