Although vanilla Python is fairly slow and hence not a good candidate, there are several options to significantly increase the efficiency of Python programs.
When you complete this training you will
- understand and identify performance bottlenecks of Python;
- know some libraries that can help improve performance for scientific computing such as numpy, numexpr and numba;
- be able to use Cython to improve your code's performance;
- be able to wrap C, C++ and Fortran code to use it from Python;
- understand the opportunities and pitfalls of multi-threaded programming with Python;
- be able to write distributed application using MPI;
- have an understanding of how frameworks for distributed computing such as dask and pyspark work.
Total duration: 4 hours.
Subject | Duration |
---|---|
introduction and motivation | 5 min. |
performance and profiling | 25 min. |
libraries | 10 min. |
Cython | 60 min. |
coffee break | 10 min. |
interfacing with C/C++/Fortran | 30 min. |
multi-threaded programming | 10 min. |
MPI | 45 min. |
dask | 15 min. |
pyspark | 20 min. |
wrap up | 10 min. |
Slides are available in the GitHub repository, as well as example code and hands-on material.
Instructions on how to create the required software environment are available.
This training is for you if you need to use Python for computationally intensive scientific computing.
You will need experience programming in Python, using numpy, and have a passing familiarity with C/C++. This is not a training that starts from scratch.
If you plan to do Python programming in a Linux or HPC environment you should be familiar with these as well.
- Geert Jan Bex ([email protected])