Merged PR 291952: Add benchmark to napajs.

zuimeiaj · Jun 12, 2017 · a641034 · a641034
1 parent 0f42cf8
commit a641034
Show file tree

Hide file tree

Showing 12 changed files with 612 additions and 1 deletion.
diff --git a/benchmark/README.md b/benchmark/README.md
@@ -0,0 +1,156 @@
+# Benchmark
+
+## Summary:
+- JavaScript execution in napajs is on par with node, using the same version of V8, which is expected.
+- `zone.execute` scales linearly on number of workers, which is expected.
+- The overhead of calling `zone.execute` from node is around 0.1ms after warm-up, `zone.executeSync` is around 0.2ms. 
+- `transport.marshall` cost on small plain JavaScript values is about 3x of JSON.stringify.
+- The overhead of `store.set` and `store.get` is around 0.06ms plus transport overhead on the objecs.
+
+## Napa vs. Node on JavaScript execution 
+Please refer to [node-napa-perf-comparison.ts](node-napa-perf-comparison.ts).
+
+| node time | napa time |
+| --------- | --------- |
+| 3026.76   | 3025.81   |
+
+## Linear scalability
+`zone.execute` scales linearly on number of workers. We performed 1M CRC32 calls on a 1024-length string on each worker, here are the numbers. We still need to understand why the time of more workers running parallel would beat less workers.
+
+|         | node        | napa - 1 worker | napa - 2 workers | napa - 4 workers | napa - 8 workers |
+| ------- | ----------- | --------------- | ---------------- | ---------------- | ---------------- |
+| time    | 8,649521600 | 6146.98         | 4912.57          | 4563.48          | 6168.41          |
+| cpu%    | ~15%        | ~15%            | ~27%             | ~55%             | ~99%             |
+
+Please refer to [execute-scalability.ts](./execute-scalability.ts) for test details.
+
+## Execute overhead
+The overhead of `zone.execute` includes
+1. marshalling cost of arguments in caller thread.
+2. queuing time before a worker can execute.
+3. unmarshalling cost of arguments in target worker.
+4. marshalling cost of return value from target worker.
+5. (executeSync only) internal delay in waiting for execute finish.
+6. queuing time before caller callback is notified. 
+7. unmarshalling cost of return value in caller thread.
+
+In this section we will examine #2, #5 and #6. So we use empty function with no arguments and no return value.
+
+Transport overhead (#1, #3, #4, #7) varies by size and complexity of payload, will be benchmarked separately in [Transport Overhead](#transport-overhead) section.
+
+Please refer to [execute-overhead.ts](./execute-overhead.ts) for test details.
+
+### Overhead after warm-up
+Average overhead is around 0.06ms to 0.12ms for `zone.execute`, and around 0.16ms for `zone.executeSync`
+
+| repeat   | zone.execute (ms) | zone.executeSync (ms) |
+|----------|-------------------|-----------------------|
+| 200      | 24.932            | 31.55                 |
+| 5000     | 456.893           | 905.972               |
+| 10000    | 810.687           | 1799.866              |
+| 50000    | 3387.361          | 8169.023              |
+
+### Overhead during warm-up:
+
+| Sequence of call | Time (ms) |
+|------------------|-----------|
+| 1                |  6.040    |
+| 2                |  4.065    |
+| 3                |  5.250    |
+| 4                |  4.652    |
+| 5                |  1.572    |
+| 6                |  1.366    |
+| 7                |  1.403    |
+| 8                |  1.213    |
+| 9                |  0.450    |
+| 10               |  0.324    |
+| 11               |  0.193    |
+| 12               |  0.238    |
+| 13               |  0.191    |
+| 14               |  0.230    |
+| 15               |  0.203    |
+| 16               |  0.188    |
+| 17               |  0.188    |
+| 18               |  0.181    |
+| 19               |  0.185    |
+| 20               |  0.182    |
+
+
+## Transport overhead
+
+The overhead of `transport.marshall` includes
+1. overhead of needing replacer callback during JSON.stringify. (even empty callback will slowdown JSON.stringfiy significantly)
+2. traverse every value during JSON.stringify, to check value type and get `cid` to put into payload.
+    - a. If value doesn't need special care.
+    - b. If value is a transportable object that needs special care.
+
+2.b is related to individual transportable classes, which may vary per individual class. Thus we examine #1 and #2.a in this test.
+
+The overhead of `transport.unmarshall` includes
+1. overhead of needing reviver callback during JSON.parse.
+2. traverse every value during JSON.parse, to check if object has `_cid` property.
+    - a. If value doesn't have property `_cid`.
+    - b. Otherwise, find constructor and call the [`Transportable.marshall`](../docs/api/transport.md#transportable-marshall).
+
+We also evaluate only #1, #2.a in this test.
+
+Please refer to [transport-overhead.ts](./transport-overhead.ts) for test details.
+
+\*All operations are repeated for 1000 times.
+
+| payload type                       | size  | JSON.stringify (ms) | transport.marshall (ms) | JSON.parse (ms) | transport.unmarshall (ms) |
+| ---------------------------------- | ----- | ------------------- | ----------------------- | --------------- | ------------------------- |
+| 1 level - 10 integers              | 91    | 4.90                | 18.05 (3.68x)           | 3.50            | 17.98 (5.14x)             |
+| 1 level - 100 integers             | 1081  | 65.45               | 92.78 (1.42x)           | 20.45           | 122.25 (5.98x)            |
+| 10 level - 2 integers              | 18415 | 654.40              | 2453.37 (3.75x)         | 995.02          | 2675.72 (2.69x)           |
+| 2 level - 10 integers              | 991   | 19.74               | 66.82 (3.39x)           | 27.85           | 138.45 (4.97x)            |
+| 3 level - 5 integers               | 1396  | 33.66               | 146.33 (4.35x)          | 51.54           | 189.07 (3.67x)            |
+| 1 level - 10 strings - length 10   | 201   | 3.81                | 10.17 (2.67x)           | 9.46            | 20.81 (2.20x)             |
+| 1 level - 100 strings - length 10  | 2191  | 76.53               | 115.74 (1.51x)          | 77.71           | 181.24 (2.33x)            |
+| 2 level - 10 strings - length 10   | 2091  | 30.15               | 97.65 (3.24x)           | 95.51           | 213.20 (2.23x)            |
+| 3 level - 5 strings - length 10    | 2646  | 41.95               | 155.42 (3.71x)          | 123.82          | 227.90 (1.84x)            |
+| 1 level - 10 strings - length 100  | 1101  | 7.74                | 12.19 (1.57x)           | 17.34           | 29.83 (1.72x)             |
+| 1 level - 100 strings - length 100 | 11191 | 66.17               | 112.83 (1.71x)          | 197.67          | 282.63 (1.43x)            |
+| 2 level - 10 strings - length 100  | 11091 | 68.46               | 149.99 (2.19x)          | 202.85          | 298.19 (1.47x)            |
+| 3 level - 5 integers               | 13896 | 89.46               | 208.21 (2.33x)          | 265.25          | 418.42 (1.58x)            |
+| 1 level - 10 booleans              | 126   | 2.84                | 8.14 (2.87x)            | 3.06            | 14.20 (4.65x)             |
+| 1 level - 100 booleans             | 1341  | 20.28               | 59.36 (2.93x)           | 21.59           | 121.15 (5.61x)            |
+| 2 level - 10 booleans              | 1341  | 23.92               | 89.62 (3.75x)           | 31.84           | 137.92 (4.33x)            |
+| 3 level - 5 booleans               | 1821  | 36.15               | 138.24 (3.82x)          | 55.71           | 195.50 (3.51x)            |
+
+## Store access overhead
+
+The overhead of `store.set` includes
+1. overhead of calling `transport.marshall` on value.
+2. overhead of put marshalled data and transport context into C++ map (with exclusive_lock).
+
+The overhead of `store.get` includes
+1. overhead of getting marshalled data and transport context from C++ map (with shared_lock).
+2. overhead of calling `transport.unmarshall` on marshalled data.
+
+For `store.set`, numbers below indicates the cost beyond marshall is around 0.07~0.4ms varies per payload size. (10B to 18KB). `store.get` takes a bit more: 0.06~0.9ms with the same payload size varance. If the value in store is not updated frequently, it's always good to cache it in JavaScript world.
+
+Please refer to [store-overhead.ts](./store-overhead.ts) for test details.
+
+\*All operations are repeated for 1000 times.
+
+| payload type                       | size  | transport.marshall (ms) | store.save (ms) | transport.unmarshall (ms) | store.get (ms) |
+| ---------------------------------- | ----- | ----------------------- | --------------- | ------------------------- | -------------- |
+| 1 level - 1 integers               | 10    | 2.54                    | 73.85           | 3.98                      | 65.57          |
+| 1 level - 10 integers              | 91    | 8.27                    | 98.55           | 17.23                     | 90.89          |
+| 1 level - 100 integers             | 1081  | 97.10                   | 185.31          | 144.75                    | 274.39         |
+| 10 level - 2 integers              | 18415 | 2525.18                 | 2973.17         | 3093.06                   | 3927.80        |
+| 2 level - 10 integers              | 991   | 71.22                   | 174.01          | 154.76                    | 276.04         |
+| 3 level - 5 integers               | 1396  | 127.06                  | 219.73          | 182.27                    | 337.59         |
+| 1 level - 10 strings - length 10   | 201   | 14.43                   | 79.68           | 31.28                     | 84.71          |
+| 1 level - 100 strings - length 10  | 2191  | 104.40                  | 212.44          | 173.32                    | 239.09         |
+| 2 level - 10 strings - length 10   | 2091  | 79.54                   | 188.72          | 189.29                    | 252.83         |
+| 3 level - 5 strings - length 10    | 2646  | 155.14                  | 257.78          | 276.22                    | 342.95         |
+| 1 level - 10 strings - length 100  | 1101  | 15.22                   | 89.84           | 30.87                     | 88.18          |
+| 1 level - 100 strings - length 100 | 11191 | 119.89                  | 284.05          | 287.17                    | 403.77         |
+| 2 level - 10 strings - length 100  | 11091 | 137.10                  | 299.32          | 244.13                    | 297.12         |
+| 3 level - 5 integers               | 13896 | 183.84                  | 310.89          | 285.80                    | 363.50         |
+| 1 level - 10 booleans              | 126   | 5.74                    | 49.89           | 22.69                     | 97.27          |
+| 1 level - 100 booleans             | 1341  | 57.41                   | 157.80          | 106.30                    | 218.05         |
+| 2 level - 10 booleans              | 1341  | 76.93                   | 150.25          | 104.02                    | 185.82         |
+| 3 level - 5 booleans               | 1821  | 102.47                  | 171.44          | 150.42                    | 207.27         |
diff --git a/benchmark/bench-utils.ts b/benchmark/bench-utils.ts
@@ -0,0 +1,47 @@
+/// <summary> Utility function to generate object for testing. </summary>
+
+export function generateString(length: number): string {
+    return Array(length).join('x');
+}
+
+export function generateObject(keys: number, depth: number, valueType: string = "string", valueLength = 7) {
+    let object: any = {};
+    for (let i = 0; i < keys; ++i) {
+        let key = `key${i}`;
+        let value: any = null;
+        if (depth > 1) {
+            value = generateObject(keys, depth - 1, valueType, valueLength);
+        }
+        else if (valueType === 'string') {
+            // We try to make each string different.
+            value = generateString(valueLength - 1) + (depth * keys + i);
+        }
+        else if (valueType === 'number') {
+            value = i;
+        }
+        else if (valueType === 'boolean') {
+            value = i % 2 == 0;
+        }
+        object[key] = value;
+    }
+    return object;
+}
+
+export function timeDiffInMs(diff: [number, number]): number {
+    return (diff[0] * 1e9 + diff[1]) / 1e6;
+}
+
+export function formatTimeDiff(diff: number | [number, number], printUnit: boolean = false): string {
+    if (Array.isArray(diff)) {
+        diff = timeDiffInMs(diff);
+    }
+    let message = diff.toFixed(2);
+    if (printUnit) {
+        message += "ms"
+    }
+    return message;
+}
+
+export function formatRatio(divident: number, divider: number): string {
+    return "(" + (divident / divider).toFixed(2) + "x)";
+}
diff --git a/benchmark/bench.ts b/benchmark/bench.ts
@@ -0,0 +1,3 @@
+import { bench } from './index';
+
+bench();
diff --git a/benchmark/execute-overhead.ts b/benchmark/execute-overhead.ts
@@ -0,0 +1,50 @@
+import * as napa from 'napajs';
+import * as mdTable from 'markdown-table';
+import { formatTimeDiff } from './bench-utils';
+
+export function bench(zone: napa.zone.Zone): Promise<void> {
+    console.log("Benchmarking execute overhead...");
+
+    // Prepare a empty function.
+    zone.broadcastSync("function test() {}");
+    const ARGS = [1, "hello", {a: 1, b: true}];
+
+    // Warm-up.
+    const WARMUP_REPEAT: number = 20;
+
+    console.log("## Execute overhead during warmup\n")
+    let warmupTable = [];
+    warmupTable.push(["call #", "time (ms)"]);
+    for (let i = 0; i < WARMUP_REPEAT; ++i) {
+        let start = process.hrtime();
+        zone.executeSync("", "test", ARGS);
+        warmupTable.push([i.toString(), formatTimeDiff(process.hrtime(start))]);
+    }
+    console.log(mdTable(warmupTable));
+
+    // executeSync after warm-up
+    const REPEAT: number = 10000;
+    console.log("## `zone.executeSync` overhead\n");
+    let start = process.hrtime();
+    for (let i = 0; i < REPEAT; ++i) {
+        zone.executeSync("", "test", ARGS);
+    }
+    console.log(`Elapse of running empty function for ${REPEAT} times: ${formatTimeDiff(process.hrtime(start), true)}\n`);
+
+    // execute after warm-up
+    return new Promise<void>((resolve, reject) => {
+        let finished = 0;
+        let start = process.hrtime();
+        for (let i = 0; i < REPEAT; ++i) {
+            zone.execute("", "test", ARGS).then(() => {
+                ++finished;
+                if (finished === REPEAT) {
+                    console.log("## `zone.execute` overhead\n");
+                    console.log(`Elapse of running empty function for ${REPEAT} times: ${formatTimeDiff(process.hrtime(start), true)}`);
+                    console.log('');
+                    resolve();
+                }
+            });
+        }
+    });
+}
diff --git a/benchmark/execute-scalability.ts b/benchmark/execute-scalability.ts
@@ -0,0 +1,90 @@
+import * as napa from 'napajs';
+import * as assert from 'assert';
+import * as mdTable from 'markdown-table';
+import { formatTimeDiff } from './bench-utils';
+
+function makeCRCTable(){
+    var c;
+    var crcTable = [];
+    for(var n =0; n < 256; n++){
+        c = n;
+        for(var k =0; k < 8; k++){
+            c = ((c&1) ? (0xEDB88320 ^ (c >>> 1)) : (c >>> 1));
+        }
+        crcTable[n] = c;
+    }
+    return crcTable;
+}
+
+let crcTable = makeCRCTable();
+function crc32(str: string) {
+    let crc = 0 ^ (-1);
+    for (var i = 0; i < str.length; i++ ) {
+        crc = (crc >>> 8) ^ crcTable[(crc ^ str.charCodeAt(i)) & 0xFF];
+    }
+
+    return (crc ^ (-1)) >>> 0;
+};
+
+function testCrc() {
+    const REPEAT: number = 1000000;
+    let result = 0;
+    let key = Array(1024).join('x');
+    for (let i = 0; i < REPEAT; ++i) {
+        let hash = crc32(key);
+        result = result ^ hash;
+    }
+    return result;
+}
+
+export function bench(zone: napa.zone.Zone): Promise<void> {
+    console.log("Benchmarking execute scalability...");
+
+    // Prepare a empty function.
+    zone.broadcastSync(makeCRCTable.toString());
+    zone.broadcastSync("var crcTable = makeCRCTable();");
+    zone.broadcastSync(crc32.toString());
+    zone.broadcastSync(testCrc.toString());
+
+    // Warm-up.
+    let crcResult = testCrc();
+    zone.broadcastSync("testCrc()");
+
+    // Execute in Node with 1 thread.
+    let start = process.hrtime();
+    assert(testCrc() === crcResult);
+    let nodeTime = formatTimeDiff(process.hrtime(start));
+
+    let executeTime = {};
+    let scalabilityTest = function(workers: number): Promise<void> {
+        let finished = 0;
+        let start = process.hrtime();
+
+        return new Promise<void>((resolve, reject) => {
+            for (let i = 0; i < workers; ++i) {
+                zone.execute("", "testCrc", []).then((result: napa.zone.ExecuteResult) => {
+                    assert(crcResult === result.value);
+                    ++finished;
+                    if (finished === workers) {
+                        executeTime[workers] = formatTimeDiff(process.hrtime(start))
+                        resolve();
+                    }
+                });
+            }
+        })
+    };
+
+    // Execute from 1 worker to 8 workers.
+    return scalabilityTest(1)
+        .then(() => scalabilityTest(2))
+        .then(() => scalabilityTest(4))
+        .then(() => scalabilityTest(8))
+        .then(() => { 
+            console.log("## Execute scalability\n")
+            console.log(mdTable([
+                ["node", "napa - 1 worker", "napa - 2 workers", "napa - 4 workers", "napa - 8 workers"],
+                [nodeTime, executeTime[1], executeTime[2], executeTime[4], executeTime[8]]
+            ]));
+            console.log('');
+        });
+}
diff --git a/benchmark/index.ts b/benchmark/index.ts
@@ -0,0 +1,20 @@
+import * as napa from 'napajs';
+import * as nodeNapaPerfComp from './node-napa-perf-comparison';
+import * as executeOverhead from './execute-overhead';
+import * as executeScalability from './execute-scalability';
+import * as transportOverhead from './transport-overhead';
+import * as storeOverhead from './store-overhead';
+
+export function bench(): Promise<void> {
+    // Non-zone related benchmarks.
+    transportOverhead.bench();
+    storeOverhead.bench();
+
+    // Create zones for execute related benchmark.
+    let singleWorkerZone = napa.zone.create('single-worker-zone', { workers: 1});
+    let multiWorkerZone = napa.zone.create('multi-worker-zone', { workers: 8 });
+
+    return nodeNapaPerfComp.bench(singleWorkerZone)
+        .then(() => { return executeOverhead.bench(singleWorkerZone); })
+        .then(() => { return executeScalability.bench(multiWorkerZone);});
+}