Skip to content

Latest commit

 

History

History
230 lines (184 loc) · 7.71 KB

README.md

File metadata and controls

230 lines (184 loc) · 7.71 KB

Adaptive Concretization for Parallel Program Synthesis

This project includes the testing infrastructure to empirically evaluate our adaptive concretization algorithm for parallel program synthesis. For more details about the algorithm and the usage of this infrastructure, visit our GitHub page here.

Publication

Structure

  • __init__.py : a placeholder to enable other python modules to use each other
  • benchmark/
  • clean.sh : script to clean up remnants
  • config.full.json : full version of experiments
  • config.short.json : short version of experiments
  • data/ : a place for Sketch outputs, figures, etc.
  • db.py : script to manipulate database
  • figure.py : script to draw figures
  • post.py : script to post-analyze Sketch output
  • psketch.py : script to run Sketch back-end in parallel
  • run.py : script to run experiments
  • simulate.py : script to simulate online strategies using database
  • util.py : utilities
  • wilcoxon.py : script to simulate Wilcoxon test using Monte Carlo method

Usage

This section explains script usages in the order of experiments conducted. For more details, again, refer to our GitHub page here.

run.py

This script is the main runner that runs Sketch in various settings:

  • running plain Sketch,
  • running only sketch-backend in parallel,
  • running Sketch with fixed randdegree (see our paper for the terms), and
  • running Sketch with adaptive concretization along with various numbers of cores.

All Sketch outputs will be saved under data/ folder.

$ ./run.py -r n [-b benchmark_name]* [-d degree]* [-c n] [--strategy WILCOXON] [-s] [--timeout t]

You can run multiple benchmarks and multiple degrees (if not using adaptive concretization):

$ ./run.py -r 3 -b colorchooser_demo -b reverse -d 16 -d 64 [-c 32]

To use adaptive concretization, specify the strategy as follows:

$ ./run.py [...] --strategy WILCOXON

To show our hypothesis that optimal randdegrees vary from benchmark to benchmark, we run Sketch with several fixed randdegree many times. To easily conduct these experiments, this script has a feature to run only back-end in parallel:

$ ./run.py -s [...]

If you're interested, see psketch.py which acts as a wrapper for Sketch's back-end script, cegis.

By default, run.py reads config.short.json, which is a subset of config.full.json in terms of benchmarks, cores, and degrees configurations. If you want to examine all benchmarks with all possible combinations of configurations, pass config.full.json to run.py:

$ ./run.py --config config.full.json [...]

post.py

This script interprets Sketch outputs (under data/ folder unless specifed elsewhere) and retrieves statistical information, such as elapsed time, number of trials, chosen randdegree, etc.:

$ ./post.py [-s] [-d data_dir] [-b benchmark_name]*

Instead of several outputs under a certain folder, you can also pass single file(s):

$ ./post.py (path/to/output_file)*

db.py

This script has various features to manipulate database:

  • -c init: initializing tables
  • -c clean: cleaning data in the tables
  • -c reset: resetting tables
  • -c register: registering statistics data (from post.py)
  • -c stat: retrieving statistics

By default, this script assumes to access to the database with username sketchperf and database name concretization. (See our GitHub page here for how to set up database.) If you are using different names, specify them:

$ ./db.py --user user_name --db db_name [...]

This script will read Sketch output files under data/ folder, but you can also specify either folder or single file(s):

$ ./db.py -d data_dir [...]
$ ./db.py (path/to/output_file)* [...]

For the experiment about back-end behaviors, similar to run.py, explicitly specify it (along with a distinct EID number):

$ ./db.py -s -e 11 [...]

If you want to see what queries this script makes, turn on the verbosity option:

$ ./db.py -v [...]

simulate.py

As a proof-of-concept, this script simulates several online concretization strategies, such as random, fixed degree, minimal time, and Wilcoxon-based statistical comparison. The calculation of expected running time is based on the empirical data in the database, especially the experiment about back-end behaviors, where SAT propagation information is recorded. If you use a different EID for that experiment (see our GitHub page here and/or db.py above), you may need to specify it.

$ ./simulate.py [-e EID]

Since the simulation depends on the empirical data in the database, similar options for the database are available:

$ ./simulate.py [--user user_name] [--db db_name]

The output of this script is simulated running time, along with variance and quantiles. I.e., the smaller, the better. However, Note that it is quite tricky to simulate parallel running, so this simulation result can not guarantee the best solution.

wilcoxon.py

This script exploits two Wilcoxon tests: Wilcoxon signed-rank test and Wilcoxon rank-sum test.

Degree Comparisons

To set a good confidential cutoff (so-called p-value) and the number of samplings, this script simulates Wilcoxon signed-rank test by using Monte Carlo method: given configuration of the number of sampling and two target degrees, it will randomly pick that number of samples from the database; compare the given degrees using that sampling; and repeat this process many times. The distributions of two degrees are computed as the multiplication of running time and search space, which could be retrieved from backend data set.

Similar to simulate.py, you may need to specify your own EID if any.

$ ./wilcoxon.py -c degree [-e EID]

This script uses the same backend data as simulate.py does, and similar options for the database are available:

$ ./wilcoxon.py -c degree [--user user_name] [--db db_name]
Performance Comparisons

Unlike degree comparisons, which literally compare two degrees running on the same benchmark, comparing performance results running on different design choices requires different statistical test: Wilcoxon rank-sum test, not Wilcoxon signed-rank test.

To statistically compare two performance results, you should specify corresponding EIDs:

$ ./wilcoxon.py -c perf -e EID1 -e EID2

figure.py

Although tables with detailed numbers are comprehensive, those are often too verbose and not really useful for presenting the meanings in the results. This script is designed to convert raw data generated by db.py to graphs.

For the experiment about back-end behaviors, similar to run.py and db.py above, explicitly specify it (along with a distinct EID number):

$ ./figure.py -s -e 11 [...]

It will generate figures per benchmark; x-axis of those figures is various degrees, whereas y-axis is either average size of SAT formulae propagation or average time of successful cases and failure cases. Besides, it will also generate so-called vee chart that shows the idea that optimal degrees vary from benchmark to benchmark.

For the experiments about adaptive concretization and its scalability, just run the script:

$ ./figure.py [...]

It will generate two figures: the one that compares adaptive concretization against the plain Sketch, and the other that contrasts adaptive concretization with different numbers of cores.