Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update validator to use samlify-validator-js #510

Closed
wants to merge 1 commit into from

Conversation

ivarconr
Copy link

@ivarconr ivarconr commented Feb 2, 2023

@authenio/samlify-node-xmllint has a built in memory leak every time you run the SAML schema validator. It was related to a library it depends on, node-xmllint, where the source code does not exist anymore.

I have therefore create a new library called "samlify-validator-js", building on the existing work, and removed the memory leak.

@authenio/samlify-node-xmllint has a built in memory leak every time you run the SAML schema validator. It was related to a library it depends on, node-xmllint, where the source code does not exist anymore. 

I have therefore create a new library called "samlify-validator-js", building on the existing work, and removed the memory leak.
@tngan
Copy link
Owner

tngan commented Feb 2, 2023

@ivarconr would you like to further explain what you have fixed for the memory leak? the file you have modified is the compiled file, so it's also difficult for others to cross-check the code change.

thanks for your work!

Copy link
Owner

@tngan tngan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check the above comment

@ivarconr
Copy link
Author

ivarconr commented Feb 3, 2023

Yes, I carefully copied of the node-xmlint npm package and included that as a package to the new package (github source code). This was needed because the source code for node-xmlint has been removed.

To identify the the memoery leak I took heap Snapshots from our application and identified that the leak was associated with event listeners being attached to the node process. It does not make any sense for that module to register itself as an event listener on the node process itself. It does this for every execution of the SAML validation.

To highlight the fixes performed I isolated that in its own commit. The change is not possible inspect in the GitHub UI, because of the nature of the compiled JavaScript file being to large.

I simply removed two lines:

  1. process["on"]("uncaughtException",(function(ex){if(!(ex instanceof ExitStatus)){throw ex}}))}
  2. process["stdout"]["once"]("drain",(function(){process["exit"](status)}));console.log(" ");

Also highlighted in the IDE diff:
1: image

2: image

This was the easiest way to fix the problem, and allowed us to not have a memory leak in our application. It is not an option for us to rely on any third party tools in Java or C++, as it increases the complexity of self-hosting Unleash.

Future improvements:
In the future we will look in to generating our own port of xmllint to WASM, and build on that to do XML schema validation in pure javascript. This would allows us to heavily reduce the size of this module (down to KBs instead of MBs). I could not find a suitable project already solving this in a a perfect way, the closest module is https://github.com/noppa/xmllint-wasm. I got xmllint-wasm working as a validator, but the fact that it uses workers behind the scene scares me and my simple tests ended up consuming a lot of CPU.

fyi; We are already using this in Unleash Cloud (https://www.getunleash.io/).

@ivarconr ivarconr requested a review from tngan February 22, 2023 19:48
@tngan
Copy link
Owner

tngan commented Jun 13, 2023

@ivarconr Thank you for your work, I have created @authenio/samlify-xmllint-wasm as the validator using xmllint-wasm. It might be worth to take a look on the CPU usage. I think it's helpful for someone who doesn't want to have java dependencies.

Since the source code of xmllint was removed, I will stop the maintenance of the validator, xmllint.js is a compiled script which I was not aware of before.

@ivarconr
Copy link
Author

ivarconr commented Aug 8, 2023

I have created @authenio/samlify-xmllint-wasm as the validator using xmllint-wasm.

This is definitely the way to go. Removing hard dependencies on Java is a good thing for this project. CPU is usually less of a concern (given its not crazy and you have a decent amount of sign-in requests per second).

@ivarconr
Copy link
Author

ivarconr commented Aug 8, 2023

I looked in to you code. My biggest concern is the dependency on xmllint-wasm which heavily utilized worker threads on Node.js. I would rather not have that introduced in to our application, coming from a library.

from: https://github.com/noppa/xmllint-wasm#installation

The library uses Node.js Worker threads to isolate the Emscripten wrapper from your main process (so that when it calls process.exit your whole server won't go down), which is why Node >= 12 is required.

It might be ok, but requires better understanding of potential side-effects. Any thoughts @tngan?

@ivarconr ivarconr closed this Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants