Skip to content

unseddd/symcc

Repository files navigation


                             The Symbolic Compiler


This is a compiler wrapper which embeds symbolic execution into the program
during compilation. In essence, it inserts code that computes symbolic
expressions for each value in the program. The actual computation happens
through calls to a support library at run time.

To build the pass and the support library, make sure that LLVM 8 or 9 and Z3
version 4.5 or later are installed. (Alternatively, see below for using the
provided Dockerfile.) Make sure to pull the Qsym code:

$ git submodule init
$ git submodule update

Note that it is not necessary or recommended to build the Qsym submodule - our
build system will automatically extract the right source files and include them
in the build.

Create a build directory somewhere, and execute the following commands inside
it:

$ cmake -G Ninja -DQSYM_BACKEND=ON /path/to/compiler/sources
$ ninja check

If LLVM is installed in a non-standard location, add the CMake parameter
"-DLLVM_DIR=/path/to/llvm/cmake/module". Similarly, you can point to a
non-standard Z3 installation with "-DZ3_DIR=/path/to/z3/cmake/module" (which
requires Z3 to be built with CMake).

The main build artifact from the user's point of view is symcc, a wrapper script
around clang that sets the right options to load our pass and link against the
run-time library. (See below for additional C++ support.)

To try the compiler, take some simple C code like the following:

#include <stdio.h>
#include <stdint.h>
#include <unistd.h>

int foo(int a, int b) {
    if (2 * a < b)
        return a;
    else if (a % b)
        return b;
    else
        return a + b;
}

int main(int argc, char* argv[]) {
    int x;
    if (read(STDIN_FILENO, &x, sizeof(x)) != sizeof(x)) {
        printf("Failed to read x\n");
        return -1;
    }
    printf("%d\n", foo(x, 7));
    return 0;
}

Save the code as "test.c". To compile it with symbolic execution built in, we
call symcc as we would normally call clang:

$ ./symcc test.c -o test

Before starting the analysis, create a directory for the results and tell SymCC:

$ mkdir results
$ export SYMCC_OUTPUT_DIR=`pwd`/results

Then run the program like any other binary, providing an arbitrary input:

$ echo 'aaaa' | ./test

The program will execute the same computations as an uninstrumented version
would, but additionally the injected code will track computations symbolically
and attempt to compute diverging inputs at each branch point. All data that the
program reads from standard input is treated as symbolic; alternatively, you can
set the environment variable SYMCC_INPUT_FILE to the name of a file whose
contents will be treated as symbolic when read. Note that due how the Qsym
backend is implemented, all input has to be available from the start. In
particular, when providing symbolic data on standard input interactively, you
need to terminate your input by pressing Ctrl+D before the program starts to
execute.

When execution is finished, the result directory will contain the new test cases
generated during program execution. Try running the program again on one of
those or combine it with a fuzzer to automate this process (see below).


                                 Documentation

The directory "docs" contains documentation on several internal aspects of
SymCC, as well as building C++ code and running SymCC with a fuzzer.


                            Building a Docker image

If you prefer a Docker container over building SymCC natively, just call Docker
after pulling the Qsym code as above:

$ git submodule init
$ git submodule update
$ docker build -t symcc .
$ docker run -it --rm symcc

This will build a Docker image and run an ephemeral container to try out SymCC.
Inside the container, "symcc" is available as a drop-in replacement for "clang",
using the Qsym backend. If you want to try modifications to SymCC, you will find
the source code in "/symcc_source" and the build files in "/symcc_build";
recompile SymCC by running "ninja -C /symcc_build check".

About

SymCC: efficient compiler-based symbolic execution

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 52.9%
  • C 12.9%
  • Rust 12.3%
  • LLVM 12.0%
  • CMake 4.6%
  • Shell 3.1%
  • Other 2.2%