To use Horovod with the Intel(R) Machine Learning Scaling Library (Intel(R) MLSL), follow the steps below.
- Install Intel MLSL.
To install Intel MLSL, follow these steps.
Source mlslvars.sh to start using Intel MLSL. Two modes are available: process
(default)
and thread
. Use the thread
mode if you are going to set more than zero MLSL servers via MLSL_NUM_SERVERS
environment variable.
$ source <install_dir>/intel64/bin/mlslvars.sh [mode]
- Install the Intel(R) MPI Library.
To install the Intel MPI Library, follow these steps.
Source the mpivars.sh script to establish the proper environment settings.
$ source <installdir_MPI>/intel64/bin/mpivars.sh release_mt
- Install Horovod from source code.
$ python setup.py build
$ python setup.py install
Advanced: You can specify the affinity for BackgroundThread with the HOROVOD_MLSL_BGT_AFFINITY environment variable. See the instructions below.
Set Horovod background thread affinity:
$ export HOROVOD_MLSL_BGT_AFFINITY=c0
where c0 is a core ID to attach background thread to.
Set the number of MLSL servers:
$ export MLSL_NUM_SERVERS=X
where X is the number of cores you’d like to dedicate for driving communication. This means that for every rank there are X MLSL servers available.
Set MLSL servers affinity:
$ export MLSL_SERVER_AFFINITY=c1,c2,..,cX
where c1,c2,..,cX are core IDs dedicated to MLSL servers (uses X ‘last’ cores by default). This variable sets affinity for all MLSL servers (MLSL_NUM_SERVERS * Number of ranks per node) that are available for all the ranks running on one node.