Cuke is a source-to-source compiler that translates tensor computations written in Python into C++/CUDA code. It was initially developed to teach compiler optimization at the University of Iowa and has since evolved into a platform for constructing domain-specific compilers for different applications.
Make sure to use python 3.8 or later:
conda create -n cuke python=3.8
conda activate cuke
Check out and install this repository:
git clone https://github.com/pengjiang-hpc/cuke
cd cuke
pip install .
An example of elementwise add
# pycuke.asg contains the class definitions of Tensor and Operators.
from pycuke.asg import *
# pycuke.asg2ir contains the translation function from ASG to IR.
from pycuke.asg2ir import gen_ir
# pycuke.codegen is the module translating IR to C++ code.
import pycuke.codegen as codegen
# Create two tensor nodes: A and B of size 10
A = Tensor((10, ))
B = Tensor((10, ))
# Create an elementwise add operator.
# 'A' and 'B' are the input nodes, 'res' is the output node.
res = A + B
# Now, we get a computational graph of three tensor nodes.
# 'gen_ir' invokes the asg->ir procedure and 'print_cpp' returns the generated C++ code.
code = codegen.cpu.print_cpp(gen_ir(res))
print(code)
An example of set intersection using cond apply
def is_in(x, li):
src = inspect.cleandoc("""
F = BinarySearch(LI, 0, LSIZE, X);
""")
found = Var(dtype='int')
found.attr['is_arg'] = False
# inline is an operator for calling external functions.
# F, LI, LSIZE, and X are all placeholders that will be replaced by the tensor nodes.
return inline(src, [('F', found)], [('X', x), ('LI', li), ('LSIZE', li._size()[0])])
def intersect(a, b):
# We create an apply operator, 'a' is the input and 'cond' is the output.
# The 'apply' operator invokes the 'is_in' function for each element of 'a'(x=a[i]).
# The 'cond' has the same size as 'a'.
# 'cond' stores the result of the 'is_in' function for each element of 'a' in the corresponding position(cond[i]=is_in(a[i], b)).
cond = a.apply(lambda x: is_in(x, b))
# We create a conditional apply operator.
# For each element 'x' of 'a'(x=a[i], i is the iterator), if 'cond[i]' is true, we make an assignment c[csize++]=a[i].
# The size of 'c'(csize) is not the same as 'a'
c = a.apply(lambda x: x, cond=cond)
return c
A = Tensor((10, ))
B = Tensor((20, ))
res = intersect(A, B)
code = codegen.cpu.print_cpp(gen_ir(res))
print(code)
More examples can be found in the apps
folder.
Q: How is cuke different from Python libraries such as numpy/scipy/pytorch?
A: The main difference is that cuke is a compiler. Instead of calling pre-compiled code, it generates source code (e.g., C++, CUDA) that runs on different hardware. The code generation is achieved through an intermediate representation, which allows users to apply various optimization transformations, such as loop fusion, parallelization, data buffering, etc. As a result, the generated code often achieves better performance than library-based solutions. Extending cuke to support new hardware is also much easier as it only requires implementation of a new backend.
Q: How is cuke different from ML compilers such as TVM/XLA/TorchScript?
A: Cuke is more focused on supporting applications with irregular computation and memory access patterns. Its main differences with TVM/XLA/TorchScript include:
- It supports a more general syntax than basic tensor algebra and thus can express more general computations than neural networks.
- It allows users to compose customized operators using the basic syntax, enabling more aggressive code optimization based on application-specific information. For example, our IPDPS'24 paper shows that cuke can perform aggressive loop fusions that TVM/XLA/TorchScript ignores.
- It supports inspector-executor compilation for indirect tensor indexing to reduce memory access overhead.
@article{hu2023cuke,
author = {Lihan Hu and Jing Li and Peng Jiang},
title = {cuKE: An Efficient Code Generator for Score Function Computation in Knowledge Graph Embedding},
booktitle = {38th IEEE International Parallel & Distributed Processing Symposium (IPDPS)},
year = {2024}
}