-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve build speed for stdlib #908
Comments
These are very illuminating findings. When it comes to stdlib, I think we need to keep in mind that:
Currently, the modules provided are I think it is okay to have ! current way
module stdlib_lapack
interface
! ... all the drivers and computational procedures ...
! ... but not auxiliary routines ...
end interface
end module In the future this can be split by topic, e.g. module stdlib_lapack_lu
interface
! ... LU-related procedures ...
end interface
end module
module stdlib_lapack
use stdlib_lapack_lu
! ...
interface
! ... procedures which fit no where else
end interface
end module I think the auxiliary routines should remain private, because they are not described in the LAPACK user's guide. We do not want to expose more public API surface than necessary. Concerning remark 4 in your list, I think sometimes such procedure are introduced to counteract unwanted compiler optimizations, say by hiding an expression behind an external procedure, to make sure it doesn't get optimized away. |
Ever since adding BLAS/LAPACK to stdlib the build times have exploded.
I have observed in my local machine (Windows + gfortran + CMake + Ninja with 16 threads) that the build takes 199 seconds. Using this tool https://ui.perfetto.dev/ as suggested by @ivan-pi it is possible to load the .ninja_log file and observe where are the bottle necks:
In the figure we observe the build chocking up at the lapack per kind files.
I started a major refactoring of stdlib's BLAS/LAPACK here with help from @ivan-pi and @perazz : https://github.com/jalvesz/stdlib/tree/split_lapack using a python script to rewrite the blas/lapack-per-kind files into submodule files per groups.
Doing that I managed to get down the build time to 38 seconds (x5 faster):
There are still some spots in which the build stalls but it is quite evident that by splitting into several submodule files enables better parallel build.
Before opening a PR, I would like to gather some broad opinions on this refactoring.
Some observations/open questions:
In order to achieve this splitting, the
refactor_blaslapack_subm.py
script contains two dictionariesblas_groups
andlapack_groups
. They define the procedures to be included in each submodule file. This splitting started by following the lapack reference manual, then I had to arbitrarily split further. I'm sure that with a proper dependency graph it could be possible to achieve a better grouping. If anyone feels like proposing a mechanism to find such grouping that would be very nice. Here a couple of references I've found interesting about such topic: https://paulcavallaro.com/blog/optimizing-function-placement-with-call-chain-clustering/ & https://medium.com/@avidaneran/using-a-graph-representation-to-analyze-python-dependencies-a57cd681fa09In the draft branch, the per-kind files were moved to a
legacy
folder. When opening the PR I think these should not be there as there is no point in having so many duplicates of the sources. Nonetheless, the splitting script can be improved and it is based on those files. Should these files be saved in a separate repo for reference?The submodules approach is proving very neat to organize and increase build performance. Now, this implies a very large header module file in order to have all interfaces for lapack functions. A better work on (1) could probably lead to finding clearer macro dependencies which could enable splitting also this header module.
While performing this refactoring I noticed that there are some procedures which I don't understand why should they be kept such as
lamc3
, which simply doesc = a + b
... is there a reason to actually keep such a function?cc: @perazz @jvdp1 @fortran-lang/stdlib
The text was updated successfully, but these errors were encountered: