forked from iree-org/iree
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Currently `iree-compile` generates PTX and puts into vmfb file. The PTX is compiled on the runtime using `cuModuleLoadDataEx`. This is insufficient in three ways: - Cannot try different `ptxas` compiler version. - `ptxas` might behave differently than `cuModuleLoadDataEx`. - Keeping PTX increases vmfb file size, but also great for jit compile. The PR implements followings: - Introduces three flags in the `iree-compile` : `--iree-hal-use-ptxas=true/false`, `--iree-hal-use-ptxas-from=<path>`, `--iree-hal-use-ptxas-params=<params>` - When `--iree-hal-use-ptxas=true`, compiles the iree generated PTX and keeps it as `cubin` in the `vmfb` file. This reduces the file size significantly. - If `--iree-hal-use-ptxas-from=<path>` is not present, it searches the `ptxas` from the path. - When `--iree-hal-use-ptxas=false`, packs the ptx into vmfb file, and let the runtime to compile. Flags can use them like below : ``` iree-compile code.mlir --iree-hal-use-ptxas-from=/usr/local/cuda-11.8/bin/ptxas -o code.vmfb NOTE: Compiling the generated PTX code $ /usr/local/cuda-11.8/bin/ptxas -arch sm_80 /tmp/iree-cuda-ptx-src-1a146b -o /tmp/iree-cuda-ptx-src-1a146b.cubin 2> /tmp/iree-cuda-ptx-log-0de384 ``` One can pass `-v` to see extra information such as register spilling, static shared and local memory usage and etc. ``` iree-compile code.mlir --iree-hal-use-ptxas-from=/usr/local/cuda-11.8/bin/ptxas --iree-hal-use-ptxas-params=-v -o code.vmfb NOTE: Compiling the generated PTX code $ /usr/local/cuda-11.8/bin/ptxas -arch sm_80 -v /tmp/iree-cuda-ptx-src-ceb4df -o /tmp/iree-cuda-ptx-src-ceb4df.cubin 2> /tmp/iree-cuda-ptx-log-dec4da ptxas info : 0 bytes gmem ptxas info : Compiling entry function 'matmul_dispatch_0_matmul_1024x1024x1024' for 'sm_80' ptxas info : Function properties for matmul_dispatch_0_matmul_1024x1024x1024 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 188 registers, 376 bytes cmem[0] ```
- Loading branch information
Showing
1 changed file
with
168 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters