GPU-acclerated Lattice Boltzmann in Python
- Free software: MIT license
- Single-GPU performance (2D): 650 MLUPS on V100
Install the anaconda package manager from www.anaconda.org
Create a new conda repository and install all dependencies:
conda create -n lettuce -c pytorch -c conda-forge\ "pytorch>=1.1" matplotlib pytest click cudatoolkit
Activate the conda environment:
conda activate lettuce
Clone this repository from github
Change into the cloned directory
Run the install script:
python setup.py install
Run the test cases:
python setup.py test
Check out the convergence order, running on CPU:
lettuce --no-cuda convergence
For running a CUDA-driven LBM simulation on one GPU omit the --no-cuda. If CUDA is not found, make sure that cuda drivers are installed and compatible with the installed cudatoolkit (see conda install command above).
Check out the performance, running on CPU:
lettuce benchmark
The following Python code will run a two-dimensional Taylor-Green vortex on a GPU:
import torch
from lettuce import BGKCollision, StandardStreaming, Lattice, D2Q9, TaylorGreenVortex2D, Simulation
device = "cuda:0" # for running on cpu: device = "cpu"
dtype = torch.float32
lattice = Lattice(D2Q9, device, dtype)
flow = TaylorGreenVortex2D(resolution=256, reynolds_number=10, mach_number=0.05, lattice=lattice)
collision = BGKCollision(lattice, tau=flow.units.relaxation_parameter_lu)
streaming = StandardStreaming(lattice)
simulation = Simulation(flow=flow, lattice=lattice, collision=collision, streaming=streaming)
mlups = simulation.step(num_steps=1000)
print("Performance in MLUPS:", mlups)
- Jonas Latt's approach of storing f_i-w_i instead of f_i, for better numerical accuracy at 16-bit precision; this can be added as a different Lattice class.
- Benchmark storage formats for f (either Qx... or ...xQ) -- also add as a different Lattice class?
- Standard Streaming and BGK collision as C++ functions, as an example and for testing performance gains https://pytorch.org/tutorials/advanced/cpp_extension.html
- Boundary Conditions.
- Multi-block lattices.
- Semi-Lagrangian streaming step (specific benefit from half-precision, utilization of tensor cores on Volta cards).
- Utilize multiple CPUs. Starting point: pytorch/pytorch#9873
- Utilize MPI to scale across multiple nodes. Starting point: https://pytorch.org/tutorials/intermediate/dist_tuto.html
We use the following third-party packages:
- pytorch
- numpy
- pytest
- click
- matplotlib
- versioneer
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.