Benchmark 02/2018

weinberger · Feb 14, 2018 · 19cb07b · 19cb07b
1 parent a1f1738
commit 19cb07b
Show file tree

Hide file tree

Showing 34 changed files with 2,572 additions and 484 deletions.
diff --git a/README.md b/README.md
@@ -1,85 +1,80 @@
 # NoSQL Performance Tests
 
-This repository contains the performance tests described in my [blog](https://www.arangodb.com/2015/06/multi-model-benchmark/). Please feel free to improve the various database test drivers. If you see any optimization I have missed, please issue a pull request.
+This repository contains the performance tests described in my [blog](https://www.arangodb.com/2018/02/nosql-performance-benchmark-2018-mongodb-postgresql-orientdb-neo4j-arangodb/). Please feel free to improve the various database test drivers. If you see any optimization I have missed, please issue a pull request.
 
 The files are structured as follows:
 
-`benchmark.js` contains the test driver and all the test cases. Currently, the following tests are implemented: `shortest`, `neighbors`, `neighbors2`, `singleRead`, `singleWrite`, and `aggregation`. Use `all` to run all tests inclusive warmup.
+`benchmark.js` contains the test driver and all the test cases. Currently, the following tests are implemented: `shortest`, `hardPath`, `neighbors`, `neighbors2`, `neighbors2data`, `singleRead`, `singleWrite` and `aggregation`. Use `all` to run all tests inclusive warmup.
 
-`arangodb`, `neo4j`, and `mongodb` are directories containing a single file `description.js`. This description file implements the database specific parts of the tests.
+`arangodb`, `arangodb_mmfiles`, `neo4j`, `mongodb`, `orientdb`, `postgresql_jsonb` and `postgresql_tabular` are directories containing the files `description.js`, `setup.sh` and `import.sh`. The description file implements the database specific parts of the tests. The setup and import files are used to set up the database and import the needed dataset for the test.
 
 `data` contains the test data used for the read and write tests and the start and end vertices for the shortest path.
 
 ## Installation
 
-```
-git clone https://github.com/weinberger/nosql-tests.git
-npm install .
-npm run data
-```
+### Client
 
-The last step will uncompress the test data file.
+We need additional services to install:
 
-## Example
+    $ curl -sL https://deb.nodesource.com/setup_8.x | sudo -E bash -
+    $ sudo apt-get install -y make build-essential nodejs
 
-```
-node benchmark arangodb -a 1.2.3.4 -t all
-```
+Clone the test repo and uncompress the test data files.
 
-runs all tests against an ArangoDB server running on host 1.2.3.4.
+    $ git clone https://github.com/weinberger/nosql-tests.git
+    $ cd nosql-tests
+    $ npm install
+    $ npm run data
 
-## Usage
+### Server
 
-```
-node benchmark -h
-Usage: benchmark <command> [options]
+The server also needs the nosql-tests repo checked out. The folder on client and server are required to have the same path!
 
-Commands:
-  arangodb  ArangoDB benchmark
-  mongodb   MongoDB benchmark
-  neo4j     neo4j benchmark
+    $ git clone https://github.com/weinberger/nosql-tests.git
 
-Options:
-  -t, --tests      tests to run separated by comma: shortest, neighbors,
-                   neighbors2, singleRead, singleWrite, aggregation
-                                                       [string] [default: "all"]
-  -s, --restrict   restrict to that many elements (0=no restriction)
-[default: 0]
-  -l, --neighbors  look at that many neighbors [default: 500]
-  -a, --address    server host                   [string] [default: "127.0.0.1"]
-  -h               Show help                                           [boolean]
-```
+For the complete setup with all databases we need several additional services:
 
-## Start Parameters
+    $ sudo apt-get install -y unzip default-jre binutils numactl collectd nodejs
+
+To install all databases and import the test dataset:
 
-We have used the following parameters to start the databases.
+    $ ./setupAll.sh
 
-**ArangoDB**
+## Run single test
 
-```
-./bin/arangod  /mnt/data/arangodb/data-2.7 --server.threads 16 --scheduler.threads 8 --wal.sync-interval 1000  --config etc/relative/arangod.conf --javascript.v8-contexts 17
-```
+To run a single test against one database, we execute `benchmark.js` over node.
 
-Admin interface: http://107.178.210.238:8529/
+    & node benchmark.js -h
+    Usage: benchmark.js <command> [options]
 
+    Commands:
+      arangodb            ArangoDB benchmark
+      arangodb-mmfiles    ArangoDB benchmark
+      mongodb             MongoDB benchmark
+      neo4j               neo4j benchmark
+      orientdb            orientdb benchmark
+      postgresql          postgresql JSON benchmark
+      postgresql_tabular  postgresql tabular benchmark
 
-**MongoDB**
+    Options:
+      --version               Show version number                          [boolean]
+      -t, --tests             tests to run separated by comma: shortest, neighbors,
+                              neighbors2, neighbors2data, singleRead, singleWrite,
+                              aggregation, hardPath, singleWriteSync
+                                                           [string] [default: "all"]
+      -s, --restrict          restrict to that many elements (0=no restriction)
+                                                                        [default: 0]
+      -l, --neighbors         look at that many neighbors            [default: 1000]
+      --ld, --neighbors2data  look at that many neighbors2 with profiles
+                                                                      [default: 100]
+      -a, --address           server host            [string] [default: "127.0.0.1"]
+      -h                      Show help                                    [boolean]
 
+    copyright 2018 Claudius Weinberger
 
-```
-./bin/mongod --storageEngine wiredTiger --syncdelay 1 --dbpath /mnt/data/mongodb/wired2/
-```
+## Run complete test setup
 
-**OrientDB**
+To run the complete test against every database, we simply execute `runAll.sh`.
 
-```
-./bin/server.sh -Xmx28G -Dstorage.wal.maxSize=28000
-```
+    ./runAll.sh <server-ip> <num-runs>    
 
-**Neo4J**
-
-```
-./bin/neo4j start
-```
-
-Admin interface: http://107.178.210.238:7474/
diff --git a/arangodb/description.js b/arangodb/description.js
@@ -1,6 +1,6 @@
 'use strict';
 
-var Database = require('arangojs');
+var arangojs = require('arangojs');
 var opts = {
   maxSockets: 25, 
   keepAlive: true, 
@@ -12,12 +12,10 @@ module.exports = {
   name: 'ArangoDB',
 
   startup: function (host, cb) {
-    var db = new Database({
+    var db = new arangojs.Database({
       url: 'http://' + host + ':8529',
       agent: new Agent(opts),
-      fullDocument: false,
-      promisify: false,
-      promise: false
+      fullDocument: false
     });
 
     cb(db);
@@ -30,38 +28,50 @@ module.exports = {
       module.exports.aggregate(db, coll, function (err, result) {
         if (err) return cb(err);
 
-        console.log('INFO step 1/2 done');
+        console.log('INFO step 1/3 done');
 
-        module.exports.getCollection(db, 'relations', function (err, coll) {
+        module.exports.getCollection(db, 'relations', function (err, coll2) {
           if (err) return cb(err);
-
-            module.exports.aggregate2(db, coll, function (err, result) {
+            db.route('_api/collection/relations/loadIndexesIntoMemory').put(function (err, result) {
             if (err) return cb(err);
 
-            console.log('INFO step 2/2 done');
-            console.log('INFO warmup done');
-
-            return cb(null);
+            console.log('INFO step 2/3 done');
+
+            var warmupIds = require('../data/warmup1000');
+            var goal = 1000;
+            var total = 0;
+            for (var i = 0; i < goal; i++) {
+              module.exports.getDocument(db, coll, warmupIds[i], function (err, result) {
+                if (err) return cb(err);
+
+                ++total;
+                if (total === goal) {
+                  console.log('INFO step 3/3 done');
+                  console.log('INFO warmup done');
+                  return cb(null);
+                }
+              }); 
+            }  
           });
         });
       });
     });
   },
 
   getCollection: function (db, name, cb) {
-    db.collection(name, cb);
+    cb(undefined, db.collection(name));
   },
 
   dropCollection: function (db, name, cb) {
-    db.dropCollection(name, cb);
+    db.collection(name).drop(cb);
   },
 
   createCollection: function (db, name, cb) {
-    db.createCollection(name, cb);
+    db.collection(name).create(cb);
   },
 
   createCollectionSync: function (db, name, cb) {
-    db.createCollection({name: name, waitForSync: true}, cb);
+    db.collection(name).create({waitForSync: true}, cb);
   },
 
   getDocument: function (db, coll, id, cb) {
@@ -80,66 +90,64 @@ module.exports = {
     db.query('FOR x IN ' + coll.name + ' COLLECT age = x.AGE WITH COUNT INTO counter RETURN {age: age, amount: counter}', cb);
   },
 
-  aggregate2: function (db, coll, cb) {
-    db.query('FOR x IN ' + coll.name + ' FILTER x._from > "" COLLECT a=1 WITH COUNT INTO counter RETURN {amount: counter}', cb);
+  aggregate2: function (db, coll, coll2, cb) {
+    db.query('LET tmp = (FOR y IN ' + coll.name + ' FOR x IN ' + coll2.name + ' FILTER x._from == CONCAT("' + coll.name + '", y._key) OR x._to == CONCAT("' + coll.name + '", y._key) COLLECT a=1 WITH COUNT INTO counter RETURN {amount: counter}) RETURN LENGTH(tmp)', cb);
   },
 
   neighbors: function (db, collP, collR, id, i, cb) {
-    db.query('RETURN NEIGHBORS(' + collP.name
-             + ', ' + collR.name + ', @key, "outbound", [], {includeData:false})', {key: collP.name + '/' + id},
+    db.query('FOR v IN OUTBOUND @key ' + collR.name + ' OPTIONS {bfs: true, uniqueVertices: "global"} RETURN v._id',
+      {key: collP.name + '/' + id},
       function (err, result) {
         if (err) return cb(err);
 
         result.all(function (err, v) {
           if (err) return cb(err);
 
-          cb(null, v[0].length);
+          cb(null, v.length);
         });
       }
     );
   },
 
   neighbors2: function (db, collP, collR, id, i, cb) {
-    db.query('RETURN NEIGHBORS(' + collP.name
-             + ', ' + collR.name + ', @key, "outbound", [], {minDepth:0 , maxDepth: 2, includeData: false})', {key: collP.name + '/' + id},
+    db.query('FOR v IN 1..2 OUTBOUND @key ' + collR.name + ' OPTIONS {bfs: true, uniqueVertices: "global"} RETURN v._id',
+      {key: collP.name + '/' + id},
       function (err, result) {
         if (err) return cb(err);
 
         result.all(function (err, v) {
           if (err) return cb(err);
 
-          cb(null, v[0].length);
+          cb(null, v.length);
         });
       }
     );
   },
 
   neighbors2data: function (db, collP, collR, id, i, cb) {
-    db.query('RETURN NEIGHBORS(' + collP.name + ', ' + collR.name + ', @key, "outbound", [], {minDepth:0 , maxDepth: 2, includeData: true})',
+    db.query('FOR v IN 1..2 OUTBOUND @key ' + collR.name + ' OPTIONS {bfs: true, uniqueVertices: "global"} RETURN v',
              {key: collP.name + '/' + id},
       function (err, result) {
         if (err) return cb(err);
 
         result.all(function (err, v) {
           if (err) return cb(err);
 
-          cb(null, v[0].length);
+          cb(null, v.length);
         });
       }
     );
   },
 
   shortestPath: function (db, collP, collR, path, i, cb) {
-    db.query('RETURN SHORTEST_PATH(' + collP.name + ', ' + collR.name
-      + ', @from, @to, "outbound", {includeData: false})',
-      {from: 'profiles/' + path.from, to: 'profiles/' + path.to}, function (err, result) {
+    db.query('FOR v IN OUTBOUND SHORTEST_PATH @from TO @to ' + collR.name + ' RETURN v._id', 
+       {from: 'profiles/' + path.from, to: 'profiles/' + path.to}, function (err, result) {
         if (err) return cb(err);
 
         result.all(function (err, v) {
           if (err) return cb(err);
 
-          var p = v[0];
-          cb(null, (p === null) ? 0 : (p.vertices.length - 1));
+          cb(null, (v.length === 0) ? 0 : (v.length - 1));
         });
       }
     );

diff --git a/arangodb/import.sh b/arangodb/import.sh
@@ -1,60 +1,56 @@
 #!/bin/bash
-set -e
 
-ARANGODB=${1-.}
+# Pass system or path to the source directory as first argument. If no argument
+# is given the current directory will be assumed to be the source directory!
+# The build directory MUST be as "build" in the source directory
 
-echo "ARANGODB DIRECTORY: $ARANGODB"
+echo "Usage [pokec.db] [path-to-arangodb] [path-to-benchmark]"
+ARANGODB=${2-databases/arangodb}
+DB=$ARANGODB/data/databases/${1-pokec}
+BENCHMARK=${3-`pwd`}
+TMP=/tmp/nosqlbenchmark
+DOWNLOADS=$TMP/downloads
 
-# import: POKEC Dataset from Stanford Snap
-# https://snap.stanford.edu/data/soc-pokec-readme.txt
+PROFILES_IN=$DOWNLOADS/soc-pokec-profiles.txt.gz
+PROFILES_OUT=$DOWNLOADS/soc-pokec-profiles-arangodb.txt.gz
 
-if [ ! -f soc-pokec-profiles.txt.gz ]; then
-  echo "Downloading PROFILES"
-  curl -OL https://snap.stanford.edu/data/soc-pokec-profiles.txt.gz
-fi
+RELATIONS_IN=$DOWNLOADS/soc-pokec-relationships.txt.gz
+RELATIONS_OUT=$DOWNLOADS/soc-pokec-relationships-arangodb.txt.gz
 
-if [ ! -f soc-pokec-relationships.txt.gz ]; then
-  echo "Downloading RELATIONS"
-  curl -OL https://snap.stanford.edu/data/soc-pokec-relationships.txt.gz
-fi
+echo "DATABASE: $DB"
+echo "ARANGODB DIRECTORY: $ARANGODB"
+echo "BENCHMARK DIRECTORY: $BENCHMARK"
+echo "DOWNLOAD DIRECTORY: $DOWNLOADS"
 
-if [ ! -f soc-pokec-profiles-arangodb.txt ]; then
+$BENCHMARK/downloadData.sh
+
+set -e
+
+if [ ! -f $PROFILES_OUT ]; then
   echo "Converting PROFILES"
-  echo '_key	public	completion_percentage	gender	region	last_login	registration	AGE	body	I_am_working_in_field	spoken_languages	hobbies	I_most_enjoy_good_food	pets	body_type	my_eyesight	eye_color	hair_color	hair_type	completed_level_of_education	favourite_color	relation_to_smoking	relation_to_alcohol	sign_in_zodiac	on_pokec_i_am_looking_for	love_is_for_me	relation_to_casual_sex	my_partner_should_be	marital_status	children	relation_to_children	I_like_movies	I_like_watching_movie	I_like_music	I_mostly_like_listening_to_music	the_idea_of_good_evening	I_like_specialties_from_kitchen	fun	I_am_going_to_concerts	my_active_sports	my_passive_sports	profession	I_like_books	life_style	music	cars	politics	relationships	art_culture	hobbies_interests	science_technologies	computers_internet	education	sport	movies	travelling	health	companies_brands	more'  > soc-pokec-profiles-arangodb.txt
-  gunzip < soc-pokec-profiles.txt.gz | sed -e 's/null//g' -e 's~^~P~' -e 's~	$~~' >> soc-pokec-profiles-arangodb.txt
+  echo '_key	public	completion_percentage	gender	region	last_login	registration	AGE	body	I_am_working_in_field	spoken_languages	hobbies	I_most_enjoy_good_food	pets	body_type	my_eyesight	eye_color	hair_color	hair_type	completed_level_of_education	favourite_color	relation_to_smoking	relation_to_alcohol	sign_in_zodiac	on_pokec_i_am_looking_for	love_is_for_me	relation_to_casual_sex	my_partner_should_be	marital_status	children	relation_to_children	I_like_movies	I_like_watching_movie	I_like_music	I_mostly_like_listening_to_music	the_idea_of_good_evening	I_like_specialties_from_kitchen	fun	I_am_going_to_concerts	my_active_sports	my_passive_sports	profession	I_like_books	life_style	music	cars	politics	relationships	art_culture	hobbies_interests	science_technologies	computers_internet	education	sport	movies	travelling	health	companies_brands	more'  > $PROFILES_OUT
+  gunzip < $PROFILES_IN | sed -e 's/null//g' -e 's~^~P~' >> $PROFILES_OUT
 fi
 
-if [ ! -f soc-pokec-relationships-arangodb.txt ]; then
+if [ ! -f $RELATIONS_OUT ]; then
   echo "Converting RELATIONS"
-  echo '_from	_to' > soc-pokec-relationships-arangodb.txt
-  gzip -dc soc-pokec-relationships.txt.gz | awk -F"\t" '{print "profiles/P" $1 "\tprofiles/P" $2}' >> soc-pokec-relationships-arangodb.txt
+  echo '_from	_to' > $RELATIONS_OUT
+  gzip -dc $RELATIONS_IN | awk -F"\t" '{print "profiles/P" $1 "\tprofiles/P" $2}' >> $RELATIONS_OUT
 fi
 
-INPUT_PROFILES=`pwd`/soc-pokec-profiles-arangodb.txt
-INPUT_RELATIONS=`pwd`/soc-pokec-relationships-arangodb.txt
-
-if [ "$ARANGODB" == "system" ];  then
-  ARANGOSH=/usr/bin/arangosh
-  ARANGOSH_CONF=/etc/arangodb/arangosh.conf
-  ARANGOIMP=/usr/bin/arangoimp
-  ARANGOIMP_CONF=/etc/arangodb/arangoimp.conf
-  APATH=.
-else
-  ARANGOSH=./bin/arangosh
-  ARANGOSH_CONF=./etc/relative/arangosh.conf
-  ARANGOIMP=./bin/arangoimp
-  ARANGOIMP_CONF=./etc/relative/arangoimp.conf
-  APATH=$ARANGODB
-fi
+ARANGOSH=$ARANGODB/usr/bin/arangosh
+ARANGOSH_CONF=$ARANGODB/etc/arangodb3/arangosh.conf
+ARANGOIMP=$ARANGODB/usr/bin/arangoimp
+ARANGOIMP_CONF=$ARANGODB/etc/arangodb3/arangoimp.conf
+APATH="$ARANGODB"
 
 (
-  cd $APATH
-
+  cd "$APATH" || { echo "failed to change into ${APATH}" ; exit 1; }
   $ARANGOSH -c $ARANGOSH_CONF << 'EOF'
-  var db = require("org/arangodb").db;
+  var db = require("@arangodb").db;
   db._create("profiles");
-  db._createEdgeCollection("relations");
+  db._createEdgeCollection("relations", {keyOptions: { type: "autoincrement", offset: 0 } })
 EOF
-  $ARANGOIMP -c $ARANGOIMP_CONF --type tsv --collection profiles --file $INPUT_PROFILES
-  $ARANGOIMP -c $ARANGOIMP_CONF --type tsv --collection relations --file $INPUT_RELATIONS
+  $ARANGOIMP -c $ARANGOIMP_CONF --server.authentication false --type tsv --collection profiles --file $PROFILES_OUT --threads 8
+  $ARANGOIMP -c $ARANGOIMP_CONF --server.authentication false --type tsv --collection relations --file $RELATIONS_OUT --threads 8
 )