With slower network, BytePS offers even more performance advantages -- up to 2x of Horovod+NCCL. You can find more evaluation results at performance.md.
We provide a step-by-step tutorial for you to run benchmark training tasks. The simplest way to start is to use our docker images. Refer to Documentations for how to launch distributed jobs and more detailed configurations. After you can start BytePS, read best practice to get the best performance.
Below, we explain how to install BytePS by yourself. There are two options.
pip3 install byteps
sudo docker import byteps.tar byteps:0.1
sudo docker run \
--gpus all \
--device /dev/nvidia0:/dev/nvidia0 \
--device /dev/nvidiactl:/dev/nvidiactl \
--device /dev/nvidia-uvm:/dev/nvidia-uvm \
--device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--name=bps \
--shm-size=32768m \
--net=host \
-itd byteps:0.1 \
zsh
sudo docker exec -it bps zsh
sudo /etc/init.d/ssh start
docker export -o byteps.tar bps
You can try out the latest features by directly installing from master branch:
git clone https://github.com/asu-gkg/byteps.git
cd byteps
python3 setup.py install
Notes for above two options:
- BytePS assumes that you have already installed one or more of the following frameworks: TensorFlow / PyTorch / MXNet.
- BytePS depends on CUDA and NCCL. You should specify the NCCL path with
export BYTEPS_NCCL_HOME=/path/to/nccl
. By default it points to/usr/local/nccl
. - The installation requires gcc>=4.9. If you are working on CentOS/Redhat and have gcc<4.9, you can try
yum install devtoolset-7
before everything else. In general, we recommend using gcc 4.9 for best compatibility (how to pin gcc). - RDMA support: During setup, the script will automatically detect the RDMA header file. If you want to use RDMA, make sure your RDMA environment has been properly installed and tested before install (install on Ubuntu-18.04).