Skip to content

yukwangmin/LSGD_Model_Parallel

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

#LSGD_Model_Parallel

07/09/2019

  • model_parallel.py: the class defination of model-parallel and pipeline-model-parallel

  • m_LSGD.py: a modification based on LSGD. m_LSGD will import the class defination in model_parallel.py

  • run_m_LSGD.sh: the run script on Cori-GPUs.


Currently model- and pipeline-model- parallel are only supported for ResNet50 architecture. Only 2 GPUs are used for model-parallel.

Local_rank and Local_size is hard-coded in m_LSGD.py

CUDA out-of-memory if running on 2 nodes, 4 GPUs per node.

About

In-node-model-parallel version of LSGD

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Shell 0.6%