Skip to content
/ byteps Public
forked from bytedance/byteps

A high performance and generic framework for distributed DNN training

Notifications You must be signed in to change notification settings

asu-gkg/byteps

This branch is 35 commits ahead of bytedance/byteps:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

6e2df2e · Aug 6, 2024
Jul 28, 2024
Aug 6, 2024
Jul 28, 2024
Jul 31, 2021
Aug 6, 2024
Feb 10, 2022
Jul 28, 2024
Aug 6, 2024
May 7, 2019
May 7, 2019
Aug 6, 2024
Oct 12, 2021
Jul 29, 2024

Repository files navigation

BytePS

With slower network, BytePS offers even more performance advantages -- up to 2x of Horovod+NCCL. You can find more evaluation results at performance.md.

Quick Start

We provide a step-by-step tutorial for you to run benchmark training tasks. The simplest way to start is to use our docker images. Refer to Documentations for how to launch distributed jobs and more detailed configurations. After you can start BytePS, read best practice to get the best performance.

Below, we explain how to install BytePS by yourself. There are two options.

Install by pip

pip3 install byteps

Use Docker file to develop

sudo docker import byteps.tar byteps:0.1
sudo docker run \
    --gpus all      \
    --device /dev/nvidia0:/dev/nvidia0         \
    --device /dev/nvidiactl:/dev/nvidiactl     \
    --device /dev/nvidia-uvm:/dev/nvidia-uvm   \
    --device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    --name=bps \
    --shm-size=32768m \
    --net=host    \
    -itd byteps:0.1 \
    zsh
sudo docker exec -it bps zsh
sudo /etc/init.d/ssh start
docker export -o byteps.tar bps

Build from source code

You can try out the latest features by directly installing from master branch:

git clone https://github.com/asu-gkg/byteps.git
cd byteps
python3 setup.py install

Notes for above two options:

  • BytePS assumes that you have already installed one or more of the following frameworks: TensorFlow / PyTorch / MXNet.
  • BytePS depends on CUDA and NCCL. You should specify the NCCL path with export BYTEPS_NCCL_HOME=/path/to/nccl. By default it points to /usr/local/nccl.
  • The installation requires gcc>=4.9. If you are working on CentOS/Redhat and have gcc<4.9, you can try yum install devtoolset-7 before everything else. In general, we recommend using gcc 4.9 for best compatibility (how to pin gcc).
  • RDMA support: During setup, the script will automatically detect the RDMA header file. If you want to use RDMA, make sure your RDMA environment has been properly installed and tested before install (install on Ubuntu-18.04).

About

A high performance and generic framework for distributed DNN training

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 61.3%
  • Python 38.5%
  • Makefile 0.2%