Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement by replacing "is" charclass lookups with context-specific functions #208

Merged

Conversation

lovell
Copy link
Contributor

@lovell lovell commented Mar 23, 2017

Hello, this is a continuation of the performance work started in #204.

Whilst splitting the is function into "charclass" vs RegExp versions provided a big improvement, the former of these remains the "hottest" function in this module. Here's what the V8 profiler says after processing ~25MB of XML:

By further splitting is into context-specific functions, namely isWhitespace, isQuote and isAttribEnd, and replacing the Object lookups with conditional statements, I see a measurable improvement for the same task:

The real-world performance gains of this change suggest a throughput improvement in the ~5-10% range.

Like last time, the existing and extensive test suite covers all the modified code paths and continues to pass after this change.

isWhitespace, isQuote and isAttribEnd for a measurable
improvement in performance
@lovell
Copy link
Contributor Author

lovell commented Jun 22, 2017

The node-expat benchmark suggests an ~11% improvement might be expected as a result of this change.

sax v1.2.2:

sax x 146,541 ops/sec ±1.06% (88 runs sampled)
node-xml x 131,356 ops/sec ±1.00% (89 runs sampled)
libxmljs x 257,700 ops/sec ±0.92% (85 runs sampled)
node-expat x 478,818 ops/sec ±0.84% (87 runs sampled)

With this change:

sax x 165,607 ops/sec ±0.78% (89 runs sampled)
node-xml x 131,282 ops/sec ±1.26% (86 runs sampled)
libxmljs x 258,247 ops/sec ±0.94% (83 runs sampled)
node-expat x 480,578 ops/sec ±0.76% (88 runs sampled)

@jacktuck
Copy link

jacktuck commented Jun 22, 2017

@lovell I don't think this repo is maintained very actively atm :( How you considered https://github.com/fb55/htmlparser2 ? It has a very similar API and I think it's faster too, they have benchmarks somewhere. I've personally switched to it from sax-js.

@isaacs isaacs merged commit 3730ac4 into isaacs:master Jun 22, 2017
@lovell lovell deleted the dedicated-isWhitespace-isQuote-functions branch June 22, 2017 17:24
@lovell
Copy link
Contributor Author

lovell commented Jun 22, 2017

@isaacs Fantastic, thank you for merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants