Skip to content

Commit

Permalink
Merged PR 291952: Add benchmark to napajs.
Browse files Browse the repository at this point in the history
  • Loading branch information
daiyip authored and Daiyi Peng committed Jun 12, 2017
1 parent 0f42cf8 commit a641034
Show file tree
Hide file tree
Showing 12 changed files with 612 additions and 1 deletion.
156 changes: 156 additions & 0 deletions benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Benchmark

## Summary:
- JavaScript execution in napajs is on par with node, using the same version of V8, which is expected.
- `zone.execute` scales linearly on number of workers, which is expected.
- The overhead of calling `zone.execute` from node is around 0.1ms after warm-up, `zone.executeSync` is around 0.2ms.
- `transport.marshall` cost on small plain JavaScript values is about 3x of JSON.stringify.
- The overhead of `store.set` and `store.get` is around 0.06ms plus transport overhead on the objecs.

## Napa vs. Node on JavaScript execution
Please refer to [node-napa-perf-comparison.ts](node-napa-perf-comparison.ts).

| node time | napa time |
| --------- | --------- |
| 3026.76 | 3025.81 |

## Linear scalability
`zone.execute` scales linearly on number of workers. We performed 1M CRC32 calls on a 1024-length string on each worker, here are the numbers. We still need to understand why the time of more workers running parallel would beat less workers.

| | node | napa - 1 worker | napa - 2 workers | napa - 4 workers | napa - 8 workers |
| ------- | ----------- | --------------- | ---------------- | ---------------- | ---------------- |
| time | 8,649521600 | 6146.98 | 4912.57 | 4563.48 | 6168.41 |
| cpu% | ~15% | ~15% | ~27% | ~55% | ~99% |

Please refer to [execute-scalability.ts](./execute-scalability.ts) for test details.

## Execute overhead
The overhead of `zone.execute` includes
1. marshalling cost of arguments in caller thread.
2. queuing time before a worker can execute.
3. unmarshalling cost of arguments in target worker.
4. marshalling cost of return value from target worker.
5. (executeSync only) internal delay in waiting for execute finish.
6. queuing time before caller callback is notified.
7. unmarshalling cost of return value in caller thread.

In this section we will examine #2, #5 and #6. So we use empty function with no arguments and no return value.

Transport overhead (#1, #3, #4, #7) varies by size and complexity of payload, will be benchmarked separately in [Transport Overhead](#transport-overhead) section.

Please refer to [execute-overhead.ts](./execute-overhead.ts) for test details.

### Overhead after warm-up
Average overhead is around 0.06ms to 0.12ms for `zone.execute`, and around 0.16ms for `zone.executeSync`

| repeat | zone.execute (ms) | zone.executeSync (ms) |
|----------|-------------------|-----------------------|
| 200 | 24.932 | 31.55 |
| 5000 | 456.893 | 905.972 |
| 10000 | 810.687 | 1799.866 |
| 50000 | 3387.361 | 8169.023 |

### Overhead during warm-up:

| Sequence of call | Time (ms) |
|------------------|-----------|
| 1 | 6.040 |
| 2 | 4.065 |
| 3 | 5.250 |
| 4 | 4.652 |
| 5 | 1.572 |
| 6 | 1.366 |
| 7 | 1.403 |
| 8 | 1.213 |
| 9 | 0.450 |
| 10 | 0.324 |
| 11 | 0.193 |
| 12 | 0.238 |
| 13 | 0.191 |
| 14 | 0.230 |
| 15 | 0.203 |
| 16 | 0.188 |
| 17 | 0.188 |
| 18 | 0.181 |
| 19 | 0.185 |
| 20 | 0.182 |


## Transport overhead

The overhead of `transport.marshall` includes
1. overhead of needing replacer callback during JSON.stringify. (even empty callback will slowdown JSON.stringfiy significantly)
2. traverse every value during JSON.stringify, to check value type and get `cid` to put into payload.
- a. If value doesn't need special care.
- b. If value is a transportable object that needs special care.

2.b is related to individual transportable classes, which may vary per individual class. Thus we examine #1 and #2.a in this test.

The overhead of `transport.unmarshall` includes
1. overhead of needing reviver callback during JSON.parse.
2. traverse every value during JSON.parse, to check if object has `_cid` property.
- a. If value doesn't have property `_cid`.
- b. Otherwise, find constructor and call the [`Transportable.marshall`](../docs/api/transport.md#transportable-marshall).

We also evaluate only #1, #2.a in this test.

Please refer to [transport-overhead.ts](./transport-overhead.ts) for test details.

\*All operations are repeated for 1000 times.

| payload type | size | JSON.stringify (ms) | transport.marshall (ms) | JSON.parse (ms) | transport.unmarshall (ms) |
| ---------------------------------- | ----- | ------------------- | ----------------------- | --------------- | ------------------------- |
| 1 level - 10 integers | 91 | 4.90 | 18.05 (3.68x) | 3.50 | 17.98 (5.14x) |
| 1 level - 100 integers | 1081 | 65.45 | 92.78 (1.42x) | 20.45 | 122.25 (5.98x) |
| 10 level - 2 integers | 18415 | 654.40 | 2453.37 (3.75x) | 995.02 | 2675.72 (2.69x) |
| 2 level - 10 integers | 991 | 19.74 | 66.82 (3.39x) | 27.85 | 138.45 (4.97x) |
| 3 level - 5 integers | 1396 | 33.66 | 146.33 (4.35x) | 51.54 | 189.07 (3.67x) |
| 1 level - 10 strings - length 10 | 201 | 3.81 | 10.17 (2.67x) | 9.46 | 20.81 (2.20x) |
| 1 level - 100 strings - length 10 | 2191 | 76.53 | 115.74 (1.51x) | 77.71 | 181.24 (2.33x) |
| 2 level - 10 strings - length 10 | 2091 | 30.15 | 97.65 (3.24x) | 95.51 | 213.20 (2.23x) |
| 3 level - 5 strings - length 10 | 2646 | 41.95 | 155.42 (3.71x) | 123.82 | 227.90 (1.84x) |
| 1 level - 10 strings - length 100 | 1101 | 7.74 | 12.19 (1.57x) | 17.34 | 29.83 (1.72x) |
| 1 level - 100 strings - length 100 | 11191 | 66.17 | 112.83 (1.71x) | 197.67 | 282.63 (1.43x) |
| 2 level - 10 strings - length 100 | 11091 | 68.46 | 149.99 (2.19x) | 202.85 | 298.19 (1.47x) |
| 3 level - 5 integers | 13896 | 89.46 | 208.21 (2.33x) | 265.25 | 418.42 (1.58x) |
| 1 level - 10 booleans | 126 | 2.84 | 8.14 (2.87x) | 3.06 | 14.20 (4.65x) |
| 1 level - 100 booleans | 1341 | 20.28 | 59.36 (2.93x) | 21.59 | 121.15 (5.61x) |
| 2 level - 10 booleans | 1341 | 23.92 | 89.62 (3.75x) | 31.84 | 137.92 (4.33x) |
| 3 level - 5 booleans | 1821 | 36.15 | 138.24 (3.82x) | 55.71 | 195.50 (3.51x) |

## Store access overhead

The overhead of `store.set` includes
1. overhead of calling `transport.marshall` on value.
2. overhead of put marshalled data and transport context into C++ map (with exclusive_lock).

The overhead of `store.get` includes
1. overhead of getting marshalled data and transport context from C++ map (with shared_lock).
2. overhead of calling `transport.unmarshall` on marshalled data.

For `store.set`, numbers below indicates the cost beyond marshall is around 0.07~0.4ms varies per payload size. (10B to 18KB). `store.get` takes a bit more: 0.06~0.9ms with the same payload size varance. If the value in store is not updated frequently, it's always good to cache it in JavaScript world.

Please refer to [store-overhead.ts](./store-overhead.ts) for test details.

\*All operations are repeated for 1000 times.

| payload type | size | transport.marshall (ms) | store.save (ms) | transport.unmarshall (ms) | store.get (ms) |
| ---------------------------------- | ----- | ----------------------- | --------------- | ------------------------- | -------------- |
| 1 level - 1 integers | 10 | 2.54 | 73.85 | 3.98 | 65.57 |
| 1 level - 10 integers | 91 | 8.27 | 98.55 | 17.23 | 90.89 |
| 1 level - 100 integers | 1081 | 97.10 | 185.31 | 144.75 | 274.39 |
| 10 level - 2 integers | 18415 | 2525.18 | 2973.17 | 3093.06 | 3927.80 |
| 2 level - 10 integers | 991 | 71.22 | 174.01 | 154.76 | 276.04 |
| 3 level - 5 integers | 1396 | 127.06 | 219.73 | 182.27 | 337.59 |
| 1 level - 10 strings - length 10 | 201 | 14.43 | 79.68 | 31.28 | 84.71 |
| 1 level - 100 strings - length 10 | 2191 | 104.40 | 212.44 | 173.32 | 239.09 |
| 2 level - 10 strings - length 10 | 2091 | 79.54 | 188.72 | 189.29 | 252.83 |
| 3 level - 5 strings - length 10 | 2646 | 155.14 | 257.78 | 276.22 | 342.95 |
| 1 level - 10 strings - length 100 | 1101 | 15.22 | 89.84 | 30.87 | 88.18 |
| 1 level - 100 strings - length 100 | 11191 | 119.89 | 284.05 | 287.17 | 403.77 |
| 2 level - 10 strings - length 100 | 11091 | 137.10 | 299.32 | 244.13 | 297.12 |
| 3 level - 5 integers | 13896 | 183.84 | 310.89 | 285.80 | 363.50 |
| 1 level - 10 booleans | 126 | 5.74 | 49.89 | 22.69 | 97.27 |
| 1 level - 100 booleans | 1341 | 57.41 | 157.80 | 106.30 | 218.05 |
| 2 level - 10 booleans | 1341 | 76.93 | 150.25 | 104.02 | 185.82 |
| 3 level - 5 booleans | 1821 | 102.47 | 171.44 | 150.42 | 207.27 |
47 changes: 47 additions & 0 deletions benchmark/bench-utils.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
/// <summary> Utility function to generate object for testing. </summary>

export function generateString(length: number): string {
return Array(length).join('x');
}

export function generateObject(keys: number, depth: number, valueType: string = "string", valueLength = 7) {
let object: any = {};
for (let i = 0; i < keys; ++i) {
let key = `key${i}`;
let value: any = null;
if (depth > 1) {
value = generateObject(keys, depth - 1, valueType, valueLength);
}
else if (valueType === 'string') {
// We try to make each string different.
value = generateString(valueLength - 1) + (depth * keys + i);
}
else if (valueType === 'number') {
value = i;
}
else if (valueType === 'boolean') {
value = i % 2 == 0;
}
object[key] = value;
}
return object;
}

export function timeDiffInMs(diff: [number, number]): number {
return (diff[0] * 1e9 + diff[1]) / 1e6;
}

export function formatTimeDiff(diff: number | [number, number], printUnit: boolean = false): string {
if (Array.isArray(diff)) {
diff = timeDiffInMs(diff);
}
let message = diff.toFixed(2);
if (printUnit) {
message += "ms"
}
return message;
}

export function formatRatio(divident: number, divider: number): string {
return "(" + (divident / divider).toFixed(2) + "x)";
}
3 changes: 3 additions & 0 deletions benchmark/bench.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
import { bench } from './index';

bench();
50 changes: 50 additions & 0 deletions benchmark/execute-overhead.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import * as napa from 'napajs';
import * as mdTable from 'markdown-table';
import { formatTimeDiff } from './bench-utils';

export function bench(zone: napa.zone.Zone): Promise<void> {
console.log("Benchmarking execute overhead...");

// Prepare a empty function.
zone.broadcastSync("function test() {}");
const ARGS = [1, "hello", {a: 1, b: true}];

// Warm-up.
const WARMUP_REPEAT: number = 20;

console.log("## Execute overhead during warmup\n")
let warmupTable = [];
warmupTable.push(["call #", "time (ms)"]);
for (let i = 0; i < WARMUP_REPEAT; ++i) {
let start = process.hrtime();
zone.executeSync("", "test", ARGS);
warmupTable.push([i.toString(), formatTimeDiff(process.hrtime(start))]);
}
console.log(mdTable(warmupTable));

// executeSync after warm-up
const REPEAT: number = 10000;
console.log("## `zone.executeSync` overhead\n");
let start = process.hrtime();
for (let i = 0; i < REPEAT; ++i) {
zone.executeSync("", "test", ARGS);
}
console.log(`Elapse of running empty function for ${REPEAT} times: ${formatTimeDiff(process.hrtime(start), true)}\n`);

// execute after warm-up
return new Promise<void>((resolve, reject) => {
let finished = 0;
let start = process.hrtime();
for (let i = 0; i < REPEAT; ++i) {
zone.execute("", "test", ARGS).then(() => {
++finished;
if (finished === REPEAT) {
console.log("## `zone.execute` overhead\n");
console.log(`Elapse of running empty function for ${REPEAT} times: ${formatTimeDiff(process.hrtime(start), true)}`);
console.log('');
resolve();
}
});
}
});
}
90 changes: 90 additions & 0 deletions benchmark/execute-scalability.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
import * as napa from 'napajs';
import * as assert from 'assert';
import * as mdTable from 'markdown-table';
import { formatTimeDiff } from './bench-utils';

function makeCRCTable(){
var c;
var crcTable = [];
for(var n =0; n < 256; n++){
c = n;
for(var k =0; k < 8; k++){
c = ((c&1) ? (0xEDB88320 ^ (c >>> 1)) : (c >>> 1));
}
crcTable[n] = c;
}
return crcTable;
}

let crcTable = makeCRCTable();
function crc32(str: string) {
let crc = 0 ^ (-1);
for (var i = 0; i < str.length; i++ ) {
crc = (crc >>> 8) ^ crcTable[(crc ^ str.charCodeAt(i)) & 0xFF];
}

return (crc ^ (-1)) >>> 0;
};

function testCrc() {
const REPEAT: number = 1000000;
let result = 0;
let key = Array(1024).join('x');
for (let i = 0; i < REPEAT; ++i) {
let hash = crc32(key);
result = result ^ hash;
}
return result;
}

export function bench(zone: napa.zone.Zone): Promise<void> {
console.log("Benchmarking execute scalability...");

// Prepare a empty function.
zone.broadcastSync(makeCRCTable.toString());
zone.broadcastSync("var crcTable = makeCRCTable();");
zone.broadcastSync(crc32.toString());
zone.broadcastSync(testCrc.toString());

// Warm-up.
let crcResult = testCrc();
zone.broadcastSync("testCrc()");

// Execute in Node with 1 thread.
let start = process.hrtime();
assert(testCrc() === crcResult);
let nodeTime = formatTimeDiff(process.hrtime(start));

let executeTime = {};
let scalabilityTest = function(workers: number): Promise<void> {
let finished = 0;
let start = process.hrtime();

return new Promise<void>((resolve, reject) => {
for (let i = 0; i < workers; ++i) {
zone.execute("", "testCrc", []).then((result: napa.zone.ExecuteResult) => {
assert(crcResult === result.value);
++finished;
if (finished === workers) {
executeTime[workers] = formatTimeDiff(process.hrtime(start))
resolve();
}
});
}
})
};

// Execute from 1 worker to 8 workers.
return scalabilityTest(1)
.then(() => scalabilityTest(2))
.then(() => scalabilityTest(4))
.then(() => scalabilityTest(8))
.then(() => {
console.log("## Execute scalability\n")
console.log(mdTable([
["node", "napa - 1 worker", "napa - 2 workers", "napa - 4 workers", "napa - 8 workers"],
[nodeTime, executeTime[1], executeTime[2], executeTime[4], executeTime[8]]
]));
console.log('');
});
}
20 changes: 20 additions & 0 deletions benchmark/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import * as napa from 'napajs';
import * as nodeNapaPerfComp from './node-napa-perf-comparison';
import * as executeOverhead from './execute-overhead';
import * as executeScalability from './execute-scalability';
import * as transportOverhead from './transport-overhead';
import * as storeOverhead from './store-overhead';

export function bench(): Promise<void> {
// Non-zone related benchmarks.
transportOverhead.bench();
storeOverhead.bench();

// Create zones for execute related benchmark.
let singleWorkerZone = napa.zone.create('single-worker-zone', { workers: 1});
let multiWorkerZone = napa.zone.create('multi-worker-zone', { workers: 8 });

return nodeNapaPerfComp.bench(singleWorkerZone)
.then(() => { return executeOverhead.bench(singleWorkerZone); })
.then(() => { return executeScalability.bench(multiWorkerZone);});
}
Loading

0 comments on commit a641034

Please sign in to comment.