This repository is a fork of the project. Here, we unite queries and data in one place. The main goal of this repository is to simplify the deployment process.
Large tables are divided into chunks, each less than 100 MB.
The repository contains some scripts for quick deployment & run of the benchmark:
- schema.sql - creates the data schema.
- copy.sql - copies data from CSV files into the tables. Uses the
datadir
variable to make absolute paths to CSV files. Usepsql -v
to define it. - parts.sql - script that has split the tables. Just for future usage.
- job - a template script for a quick launch of the benchmark.
Also, you can take a quick look into the scripts
folder for examples of benchmarking scripts. They are not ideal, as my recent experiments have shown, but can help in putting away some unnecessary slips.
The example below shows an export procedure for all tables:
psql -f ~/jo-bench/schema.sql
psql -vdatadir="'$HOME/jo-bench'" -f ~/jo-bench/copy.sql
NB! Deploying tested on database with C-locale.
To perform the same tests over partitioned into two partitions by HASH versions of the tables, use file schema-hashedparts.sql:
psql -f ~/jo-bench/schema-hashedparts.sql
psql -vdatadir="'$HOME/jo-bench'" -f ~/jo-bench/copy.sql
It creates additional schema 'parts' in the database and sets search_path to this schema, so you will see tables from this schema. Fill these tables with data the same way as in the non-partitioned case.
In case you want to vary the number of partitions and apply it only for top-6 most sizeable tables use another script:
psql -vp=<NN> -f ~/jo-bench/schema-multiparts.sql
Here NN - number of partitions you want to have on each big table. These tables will be created in the schema 'multiparts', the search_path
will be altered to reference directly this schema.