Interested in measuring Optuna's performance? You are very perceptive. Under this directory, you will find scripts that we have prepared to measure Optuna's performance.
In this document, we explain how we measure the performance of Optuna using the scripts in this directory. The contents of this document are organized as follows.
We measure the performance of black-box optimization algorithms in Optuna with
kurobako
using benchmarks/run_kurobako.py
.
You can manually run this script on the GitHub Actions if you have a write access on the repository.
Or, you can locally execute the benchmarks/run_kurobako.py
.
We explain both of method here.
You need a write access on the repository. Please run the following steps in your own forks. Note that you should pull the latest master branch of Optuna since the workflow YAML file must be placed in the default branch of the repository.
-
Open the GitHub page of your forked Optuna repository.
-
In the left sidebar, click the
Performance Benchmarks with kurobako
. -
Use the
Branch
dropdown to select the workflow's branch. The default ismaster
. And, type the input parameters:Sampler List
,Sampler Arguments List
,Pruner List
, andPruner Arguments List
. -
After finishing the workflow, you can download the report and plot from
Artifacts
.The report looks like as follows. It includes the version information of environments, the solvers (pairs of the sampler and the pruner in Optuna) and problems, the best objective value, AUC, elapsed time, and so on.
The plot looks like as follows. It represents the optimization history plot of the optimization. The title is the name of the problem. The legends represents the specified pair of the sampler and the pruner. The history is averaged over the specified
n_runs
studies with the errorbar. The horizontal axis represents the budget (#budgets * #epochs = \sum_{for each trial) (#consumed epochs in the trial)
). The vertical axis represents the objective value.
Note that the default run time of a GitHub Actions workflow job is limited to 6 hours. Depending on the sampler and number of studies you specify, it may exceed the 6-hour limit and fail. See the official document for more details.
You can run the script of benchmarks/run_kurobako.py
directly.
This section explains how to locally run it.
First, you need to install kurobako
and its Python helper.
To install kurobako
, see https://github.com/optuna/kurobako#installation for more details.
In addition, please run pip install kurobako
to install the Python helper.
You need to install gnuplot
for visualization with kurobako
.
You can install gnuplot
by package managers such as apt
(for Ubuntu) or brew
(for macOS).
Second, you need to download the dataset for kurobako
.
Run the followings in the dataset directory.
# Download hyperparameter optimization (HPO) dataset
% wget http://ml4aad.org/wp-content/uploads/2019/01/fcnet_tabular_benchmarks.tar.gz
% tar xf fcnet_tabular_benchmarks.tar.gz
# Download neural architecture search (NAS) dataset
# The `kurobako` command should be available.
% curl -L $(kurobako dataset nasbench url) -o nasbench_full.tfrecord
% kurobako dataset nasbench convert nasbench_full.tfrecord nasbench_full.bin
Finally, you can run the script of benchmarks/run_kurobako.py
.
% python benchmarks/run_kurobako.py \
--path-to-kurobako "" \ # If the `kurobako` command is available.
--name "performance-benchmarks" \
--n-runs 10 \
--n-jobs 10 \
--sampler-list "RandomSampler TPESampler" \
--sampler-kwargs-list "{} {}" \
--pruner-list "NopPruner" \
--pruner-kwargs-list "{}" \
--seed 0 \
--data-dir "." \
--out-dir "out"
Please see benchmarks/run_kurobako.py
to check the arguments and those default values.
We also have benchmarks for multi-objective optimization in kurobako
. Note that we do not have pruner support for multi-objective optimization yet.
Multi-objective benchmarks can also be run from GitHub Actions or locally. To run it from GitHub Actions, please click Performance Benchmarks with mo-kurobako
in Step 3.
To run it locally, please run benchmarks/run_mo_kurobako.py
.
% python benchmarks/run_mo_kurobako.py \
--path-to-kurobako "" \ # If the `kurobako` command is available.
--name "performance-benchmarks" \
--n-runs 10 \
--n-jobs 10 \
--sampler-list "RandomSampler TPESampler NSGAIISampler" \
--sampler-kwargs-list "{} {\"multivariate\":true,\"constant_liar\":true} {\"population_size\":20}" \
--seed 0 \
--data-dir "." \
--out-dir "out"
This workflow allows to benchmark optimization algorithms available in Optuna with bayesmark
. This is done by repeatedly performing hyperparameter search on set of scikit-learn
models fitted to a list of toy datasets and aggregating the results. Those are then compared to baseline provided by random sampler. This benchmark can be run with GitHub Actions or locally.
-
Follow points 1 and 2 from Performance Benchmarks with
kurobako
-
In the left sidebar, click the
Performance benchmarks with bayesmark
-
Here you can select branch to run benchmark from, as well as parameters. Click
Run workflow
to start the benchmark run.
benchmark-report
contains markdown file with solver leaderboards for each problem. Basic information on benchmark setup is also available.
benchmark-plots
is a set of optimization history plots for each solved problem. Similarly to kurobako
, each plot shows objective value as a function of finished trials. For each problem, average and median taken over n_runs
is shown. If Include warm-up steps in plots
checkbox was not selected in workflow config, first 10 trials will be excluded from visualizations.
See this doc for more information on bayesmark
scoring.
CI runs benchmarks on all model/dataset combination in parallel, hovever running benchmark on single problem locally is possoble. To do this, first install required Python packages.
pip install bayesmark matplotlib numpy scipy pandas Jinja2
Benchmark run can be started with
% python benchmarks/run_bayesmark.py \
--dataset iris \
--model kNN \
--budget 80 \
--repeat 10 \
--sampler-list "TPESampler CmaEsSampler" \
--sampler-kwargs-list "{\"multivariate\":true,\"constant_liar\":true} {}" \
--pruner-list "NopPruner" \
--pruner-kwargs-list "{}"
Allowed models are [kNN, SVM, DT, RF, MLP-sgd, ada, linear]
and allowed datasets are [breast, digits, iris, wine, diabetes]
. For more details on default parameters please refer to benchmarks/run_bayesmark.py
. Markdown report can be generated after benchmark has been completed by running
% python benchmarks/bayesmark/report_bayesmark.py
You'll find benchmark artifacts in plots
and report
directories.
This workflow allows to benchmark optimization algorithms available in Optuna with NASLib
. NASLib has an abstraction over a number of NAS benchmarks. Currently only NAS-Bench-201 is supported. This benchmark can be run on GitHub Actions or locally.
Please follow the same steps as in Performance Benchmarks with kurobako
, except that you need to select Performance benchmarks with NASLib
in step 3.
In order to run NASLib benchmarks, you need the following dependencies:
NASLib
and necessary data files (Currently,nb201_cifar10_full_training.pickle
,nb201_cifar100_full_training.pickle
andnb201_ImageNet16_full_training.pickle
are needed.)kurobako
kurobako-py
gnuplot
Please see each page for the detailed instructions. In short, NASLib
can be installed by cloning the NASLib, downloading all the data files under NASLib/naslib/data/
repository from GitHub, and running
$ pip3 install -e .
You also need to set up kurobako
command in the same way as we have described. After this, kurobako-py
can be installed with
$ pip3 install kurobako
Finally, you can run the script of benchmarks/run_naslib.py
.
$ python3 benchmarks/run_naslib.py \
--path-to-kurobako "" \
--name "performance-benchmarks" \
--n-runs 10 \
--n-jobs 10 \
--sampler-list "RandomSampler TPESampler" \
--sampler-kwargs-list "{} {}" \
--pruner-list "NopPruner" \
--pruner-kwargs-list "{}" \
--seed 0 \
--out-dir "out"
Please see benchmarks/run_naslib.py
to check the arguments and those default values.