Performance improvement by replacing "is" charclass lookups with context-specific functions #208

lovell · 2017-03-23T10:13:18Z

Hello, this is a continuation of the performance work started in #204.

Whilst splitting the is function into "charclass" vs RegExp versions provided a big improvement, the former of these remains the "hottest" function in this module. Here's what the V8 profiler says after processing ~25MB of XML:

By further splitting is into context-specific functions, namely isWhitespace, isQuote and isAttribEnd, and replacing the Object lookups with conditional statements, I see a measurable improvement for the same task:

The real-world performance gains of this change suggest a throughput improvement in the ~5-10% range.

Like last time, the existing and extensive test suite covers all the modified code paths and continues to pass after this change.

isWhitespace, isQuote and isAttribEnd for a measurable improvement in performance

lovell · 2017-06-22T11:34:23Z

The node-expat benchmark suggests an ~11% improvement might be expected as a result of this change.

sax v1.2.2:

sax x 146,541 ops/sec ±1.06% (88 runs sampled)
node-xml x 131,356 ops/sec ±1.00% (89 runs sampled)
libxmljs x 257,700 ops/sec ±0.92% (85 runs sampled)
node-expat x 478,818 ops/sec ±0.84% (87 runs sampled)

With this change:

sax x 165,607 ops/sec ±0.78% (89 runs sampled)
node-xml x 131,282 ops/sec ±1.26% (86 runs sampled)
libxmljs x 258,247 ops/sec ±0.94% (83 runs sampled)
node-expat x 480,578 ops/sec ±0.76% (88 runs sampled)

jacktuck · 2017-06-22T12:31:55Z

@lovell I don't think this repo is maintained very actively atm :( How you considered https://github.com/fb55/htmlparser2 ? It has a very similar API and I think it's faster too, they have benchmarks somewhere. I've personally switched to it from sax-js.

lovell · 2017-06-22T17:27:43Z

@isaacs Fantastic, thank you for merging.

Replace charclass lookups with context-specific functions

3730ac4

isWhitespace, isQuote and isAttribEnd for a measurable improvement in performance

isaacs merged commit 3730ac4 into isaacs:master Jun 22, 2017

lovell deleted the dedicated-isWhitespace-isQuote-functions branch June 22, 2017 17:24

lovell restored the dedicated-isWhitespace-isQuote-functions branch June 23, 2017 15:11

lovell mentioned this pull request Jun 28, 2017

Replace name and entity regular expressions with specific functions for ~15% performance improvement #216

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement by replacing "is" charclass lookups with context-specific functions #208

Performance improvement by replacing "is" charclass lookups with context-specific functions #208

lovell commented Mar 23, 2017

lovell commented Jun 22, 2017

jacktuck commented Jun 22, 2017 •

edited

Loading

lovell commented Jun 22, 2017

Performance improvement by replacing "is" charclass lookups with context-specific functions #208

Performance improvement by replacing "is" charclass lookups with context-specific functions #208

Conversation

lovell commented Mar 23, 2017

lovell commented Jun 22, 2017

jacktuck commented Jun 22, 2017 • edited Loading

lovell commented Jun 22, 2017

jacktuck commented Jun 22, 2017 •

edited

Loading