___ ____ __ __ _ ______ _ __
/ | / __ )____ _/ /______/ /_ (_)___ ____ _ / ____/___ ____ ___ ____ (_) /__ _____
/ /| | / __ / __ `/ __/ ___/ __ \/ / __ \/ __ `/ / / / __ \/ __ `__ \/ __ \/ / / _ \/ ___/
/ ___ | / /_/ / /_/ / /_/ /__/ / / / / / / / /_/ / / /___/ /_/ / / / / / / /_/ / / / __/ /
/_/ |_| /_____/\__,_/\__/\___/_/ /_/_/_/ /_/\__, / \____/\____/_/ /_/ /_/ .___/_/_/\___/_/
/____/ /_/
ABC is an optimizing compiler for Fully Homomorphic Encryption (FHE). FHE allows computation over encrypted data, but imposes a variety of cryptographic and engineering challenges. This compiler translates high-level program descriptions in a C-like language into the circuit-based programming paradigm of FHE. It does so while automating as many aspects of the development as possible, including automatically identifying and exploiting opportunities to use the powerful SIMD parallelism ("batching") present in many schemes.
- Repository's Structure
- Compilation
- Getting Started
- AST Representation
- Extending the Library
- References
The repository is organized as follow:
examples – simple examples (WIP)
include – header files (.h)
└ ast – classes to represents programs as an AST
└ parser – tokenizer and parser
└ runtime - the runtime system (execute an AST using an FHE library)
└ utilities – utilities for certain tasks on ASTs (e.g., printing in DOT language)
└ visitor – various visitors performing optimizations
libs – files required by CMake to download dependencies
src – source files (.cpp)
└ ast
└ parser
└ runtime
└ utilities
└ visitor
test – unit tests for all classes
We include a simple C-like high-level input language, and a parser that translates it into Abstract Syntax Trees (ASTs), which form our intermediate representation (IR).
The compilation itself can be divided into three stages:
1. Program Transformations. These AST-to-AST transformations aim to modify the program to make it more "FHE" friendly. These include optimizations common in standard compilers and FHE-specific optimizations like our automated batching optimization.
1. AST-to-Circuit Transformations. These transform the AST into a circuit by transforming non-compatible operations (e.g., If- and While-Statements) into their circuit-equivalent using gates. Note that instead of changing to a wires-and-gates IR, circuits are still expressed using (a subset of) the AST IR.
3. Circuit-to-Circuit Transformations. These transformations transform a circuit into a semantically equivalent circuit with better performance in FHE. For example, by rewriting the circuit to reduce the multiplicative depth. This allows using smaller parameters that, in turn, enable more efficient computation.
Finally, we include a runtime system that can take (circuit-compatible) ASTs and run them against FHE libraries (currently, only Microsoft SEAL is supported) or against a dummy scheme (for faster testing).
Before starting, make sure to clone this repository using:
git clone https://github.com/MarbleHE/ABC.git
The following tools are required to get this project running:
- cmake (version ≥ 3.15) to build the project
- tested with v3.15.5
- gcc or clang to compile the sources
- tested with Apple clang v11.0.0
- doxygen to build the documentation files (html)
- tested with v1.8.16
The easiest way to use this library is to import the project into CLion which automatically loads the containing cmake build files and adds the respective targets. Development was carried out on macO (10.15.2), although the project should be running on Windows or Linux machines too.
More detailed installation notes, including some for Debian/Ubuntu and Fedora are in INSTALL.md.
The entire framework is built around ASTs, the nodes of which are implemented by a class hierarchy that derives from AbstractNode (see above). The AST nodes "own" (std::unique_ptr) their children (and, transitively, the entire subgraph).
Our input language is essentially a toy version of C. We have a very simple hand-written parser on-top of a tokenizer (Stork) from another open source project. It takes a string and returns the root node of the generated AST. We also implemented the reverse, allowing us to print an AST back into this language, which is much more human-readable than our "real" intermediate representation (JSON of the AST).
The entire compiler relies heavily on the visitor pattern, and there is some template magic (see ScopedVisitor) that allows one to still use overloading in visitors, so one can e.g. handle all AbstractExpressions in one function if the visitor only cares about statements. This is mostly transparent to the developer.
All of our optimizations and transformations are implemented as visitors. Since the visitor pattern doesn't easily allow you to "return" values, we frequently use std::unordered_map<std::string, SOMETHING> , allowing us to associate data with (the unique ID of) an AST node. So when visiting an If stmt, we can visit both branches recursively and then get the "return" by looking into the hash_map at the child-nodes' ids.
Finally, our runtime system is also implemented as a visitor. It takes an "AbstractCiphertextFactory", the instantiations of which are basically just super thin wrappers around FHE libraries (currently just SEAL). So if we see a multiplication node in the tree, we just end up calling something like seal::multiply(ctxt a, ctxt b).
The AST consists of nodes that are derived from either AbstractExpression
or AbstractStatement
,
depending on whether the operation is an expression or a statement, respectively.
┌─────────────────────┐
│ AbstractNode │
└─────────────────────┘
▲
│
│
┌─────────────────────┐ │ ┌─────────────────────┐
│ AbstractStatement │─────┴─────│ AbstractExpression │
└─────────────────────┘ └─────────────────────┘
▲ ▲
┌─────────────────────┐ │ │ ┌─────────────────────┐
│ Assignment │────┤ ├─────│ AbstractTarget │
└─────────────────────┘ │ │ └─────────────────────┘
│ │ ▲
┌─────────────────────┐ │ │ │ ┌─────────────────────┐
│ Block │────┤ │ ├─────│ FunctionParameter │
└─────────────────────┘ │ │ │ └─────────────────────┘
│ │ │
┌─────────────────────┐ │ │ │ ┌─────────────────────┐
│ For │────┤ │ ├─────│ IndexAccess │
└─────────────────────┘ │ │ │ └─────────────────────┘
│ │ │
┌─────────────────────┐ │ │ │ ┌─────────────────────┐
│ Function │────┤ │ └─────│ Variable │
└─────────────────────┘ │ │ └─────────────────────┘
│ │
┌─────────────────────┐ │ │ ┌─────────────────────┐
│ If │────┤ ├─────│ BinaryExpression │
└─────────────────────┘ │ │ └─────────────────────┘
│ │
┌─────────────────────┐ │ │ ┌─────────────────────┐
│ Return │────┤ ├─────│ OperatorExpression │
└─────────────────────┘ │ │ └─────────────────────┘
│ │
┌─────────────────────┐ │ │ ┌─────────────────────┐
│ VariableDeclaration │────┘ ├─────│ UnaryExpression │
└─────────────────────┘ │ └─────────────────────┘
│
│ ┌─────────────────────┐
├─────│ Call │
│ └─────────────────────┘
│
│ ┌─────────────────────┐
├─────│ ExpressionList │
│ └─────────────────────┘
│
│ ┌─────────────────────┐
├─────│ Literal<T> │
│ └─────────────────────┘
│
│ ┌─────────────────────┐
└─────│ TernaryOperator │
└─────────────────────┘
Figure 1: Class hierarchy of the AST classes.
Following, the different node types are briefly explained. The examples in brackets show how the commands would look like in "plain" C++.
- Classes derived from
AbstractExpression
AbstractTarget
– represents "things that can be assigned to" (i.e., lvalues)FunctionParameter
– describes the parameters that a function accepts. To evaluate an AST, values must be passed for each of the parameter defined by the function'sFunctionParameter
node.IndexAccess
– represents the[]
operator in C++; has a target and an index valueVariable
– named lvalue (any string)
BinaryExpression
– an expression with two operands and one operator (e.g.,13 + 37
).OperatorExpression
– an operator with 0 or more operands (AbstractExpression
)UnaryExpression
– an expression with one operator and one operand (e.g.,!b
whereb
is a Boolean).Call
– a call to an internal function, i.e., its implementation is represented in the AST as a Function.ExpressionList
– rvalue with an order list of (potentially null) expressions (e.g.,v = {a, b, c}
assigns anExpressionList
to vectorv
)Literal<T>
– containing a scalar value of typeT
(bool
,char
,int
,float
,double
, orstd::string
)TernaryOperator
– has a condition, athen
-expression and aelse
expression (e.g.,(x < 3) ? 1 : 2
)
- Classes derived from
AbstractStatement
Assignment
– assigns an expression (rvalue) to a target (lvalue)Block
– a code block{...}
indicating scopes (e.g., thethen
-clause of anif
statement)For
– afor
-loop with an initializer, condition, update, and a bodyBlock
(i.e.,for(initializer; condition; update) { ... }
)Function
– function definition defining a name, list ofFunctionParameter
,Return
type, and a bodyBlock
.If
– defines a condition, athen
-branch and (optionally) anelse
-branch (e.g.,if (condition) { ... }
orif (condition) { ... } else { ... }
)Return
– a return statement of a method with optional expression to return (e.g.,return
)VariableDeclaration
– associates aVariable
with a type (Datatype
) and (optionally) a value (AbstractExpression
)
In general, PRs are very welcome! However, to ensure that this library keeps a high quality standard, this section introduces some conventions to be followed when extending this library.
The code is written in C++ and formatted according to code style file MarbleHE_CPP_Code_Style.xml. The file can be loaded into the IDE of your choice, for example, in CLion's preferences (Editor → Code Style → C/C++). As the style file can change at any time, please keep in mind to use the latest version before sending a PR.
This codebase was checked against the default C/C++ inspections provided in CLion.
Further, the static code checker cpplint is used that provides more advanced checks. It can be integrated into CLion using the CLion-cpplint plugin. To ensure consistency, pleasure use the following settings (to be provided in the plugin's options at Preferences -> cpplint option):
--linelength=120 --filter=-legal/copyright,-build/header_guard,-whitespace/comments,-runtime/references,-whitespace/operators
Doxygen comments are used to create a documentation of this library.
The documentation can be generated using the supplied configuration doxygen.conf
as described following:
doxygen Doxyfile
The code is covered by unit tests to achieve high code quality and avoid introducing errors while extending the library. For that, the Google Test framework is used. The library as well as any other dependencies are automatically cloned from its GitHub repository using cmake, see CMakeLists.txt.
The tests can be found in the test
directory and are named according to the class file that the test covers (e.g., MultDepthVisitorTest
for the test covering the MultDepthVisitor
class).
It is required to submit tests for newly added features to ensure correctness and avoid breaking the feature by future changes (regression test). Any PRs without tests will not be considered to be integrated.
This project uses GitHub flow, i.e. all work around an improvement/feature happens on a specific branch and is only merged into main once it passes all checks and has been reviewed.
[1] Viand, A., Shafagh, H.: Marble: Making Fully Homomorphic Encryption Accessible to All. In: Proceedings of the 6th workshop on encrypted computing & applied homomorphic cryptography. pp. 49–60 (2018).
[2] Aubry, P. et al.: Faster Homomorphic Encryption Is Not Enough: Improved Heuristic for Multiplicative Depth Minimization of Boolean Circuits. (2019).