Skip to content

Commit 5ed134e

Browse files
Peng Mengsrowen
Peng Meng
authored andcommitted
[SPARK-21305][ML][MLLIB] Add options to disable multi-threading of native BLAS
## What changes were proposed in this pull request? Many ML/MLLIB algorithms use native BLAS (like Intel MKL, ATLAS, OpenBLAS) to improvement the performance. Many popular Native BLAS, like Intel MKL, OpenBLAS, use multi-threading technology, which will conflict with Spark. Spark should provide options to disable multi-threading of Native BLAS. https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications ## How was this patch tested? The existing UT. Author: Peng Meng <[email protected]> Closes apache#18551 from mpjlu/optimzeBLAS.
1 parent f587d2e commit 5ed134e

File tree

2 files changed

+10
-0
lines changed

2 files changed

+10
-0
lines changed

conf/spark-env.sh.template

+4
Original file line numberDiff line numberDiff line change
@@ -61,3 +61,7 @@
6161
# - SPARK_IDENT_STRING A string representing this instance of spark. (Default: $USER)
6262
# - SPARK_NICENESS The scheduling priority for daemons. (Default: 0)
6363
# - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file.
64+
# Options for native BLAS, like Intel MKL, OpenBLAS, and so on.
65+
# You might get better performance to enable these options if using native BLAS (see SPARK-21305).
66+
# - MKL_NUM_THREADS=1 Disable multi-threading of Intel MKL
67+
# - OPENBLAS_NUM_THREADS=1 Disable multi-threading of OpenBLAS

docs/ml-guide.md

+6
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,12 @@ To configure `netlib-java` / Breeze to use system optimised binaries, include
6161
project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your
6262
platform's additional installation instructions.
6363

64+
The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), can use multiple threads in a single operation, which can conflict with Spark's execution model.
65+
66+
Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see [SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1.
67+
68+
Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded).
69+
6470
To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer.
6571

6672
[^1]: To learn more about the benefits and background of system optimised natives, you may wish to

0 commit comments

Comments
 (0)