Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace json-parse-helpfulerror with jsonc-parser #1493

Merged
merged 7 commits into from
Jan 26, 2025

Conversation

Torathion
Copy link
Contributor

This replaces the outdated wrapper package for jju, json-parse-helpfulerror, with the most modern json parser: jsonc-parser. This modern parser is the fastest and most widely supported and maintained json parser, used in vscode, eslint and more. Instead of avoiding jsonc files all together, we fully support them now.

Other benefits:

  • jsonc-parser is about twice as fast as jju when parsing.
  • Reduces the total package count from 1054 to 1040
  • Reduces the build size by 20 kb
  • Removes the usage of strip-json-comments

Downsides:

  • json-parse-helpfulerror does have a better performance for using the native JSON.parse. jsonc-parser is 3x slower than JSON.parse due to the extra handling of comments.

I also audited all packages, removing about 12 vulnerabilities. It's now vulnerability free. 🎉

@raineorshine
Copy link
Owner

Hi, thanks for your contribution!

I have just a couple questions about this change.

  • Does jsonc-parser have helpful error messages like json-parse-helpfulerror?
  • Can you provide some sample timing on a normal .json file with the new approach? 3x slower could be significant or insignificant depending on what the base timing is. Normal .json files should be the higher priority since that’s still what most projects use.

@Torathion
Copy link
Contributor Author

Torathion commented Jan 26, 2025

Okay, I've experimented a bit more and wrote this test file:

const jph = require('json-parse-helpfulerror')
const { parse, ParseErrorCode } = require('jsonc-parser')

const validJsonString = '{"name": "John", "age": 30, "city": "New York"}'
const invalidJsonString = '{"name": "John", "age": 30, "city": "New York"'
const jsoncString = `
  {
    "name": "John",
    // Now the age
    "age": 30
}
`

function parseJson(jsonString) {
  const errors = []
  const json = parse(jsonString, errors)

  if (errors.length > 0) {
    let errorString = ''
    const lines = jsonString.split('\n')
    for (const error of errors) {
      const offset = error.offset
      let lineNumber = 1
      let columnNumber = 1
      let currentOffset = 0
      // Calculate line and column from the offset
      for (const line of lines) {
        if (currentOffset + line.length >= offset) {
          columnNumber = offset - currentOffset + 1
          break
        }
        currentOffset += line.length + 1 // +1 for the newline character
        lineNumber++
      }
      errorString += `Error at line ${lineNumber}, column ${columnNumber}: ${ParseErrorCode[error.error]}\n${lines[lineNumber - 1]}\n${' '.repeat(columnNumber - 1)}^\n\n`
    }

    throw new Error(errorString)
  }
  return json
}

// Already call all parse functions to due Node.js caching pre optimizations
JSON.parse(validJsonString)
jph.parse(validJsonString)
parse(validJsonString)
parseJson(validJsonString)

console.time('native: valid')
for (let i = 0; i < 10000; i++) {
  JSON.parse(validJsonString)
}
console.timeEnd('native: valid')

console.time('json-parse-helpfulerror: valid')
for (let i = 0; i < 10000; i++) {
  jph.parse(validJsonString)
}
console.timeEnd('json-parse-helpfulerror: valid')

console.time('jsonc-parser: valid')
for (let i = 0; i < 10000; i++) {
  parse(validJsonString)
}
console.timeEnd('jsonc-parser: valid')

console.time('own: valid')
for (let i = 0; i < 10000; i++) {
  parseJson(validJsonString)
}
console.timeEnd('own: valid')

console.time('native: invalid')
for (let i = 0; i < 10000; i++) {
  try {
    JSON.parse(invalidJsonString)
  } catch {}
}
console.timeEnd('native: invalid')

console.time('json-parse-helpfulerror: invalid')
for (let i = 0; i < 10000; i++) {
  try {
    jph.parse(invalidJsonString)
  } catch {}
}
console.timeEnd('json-parse-helpfulerror: invalid')

console.time('jsonc-parser: invalid')
for (let i = 0; i < 10000; i++) {
  try {
    parse(invalidJsonString)
  } catch {}
}
console.timeEnd('jsonc-parser: invalid')

console.time('own: invalid')
for (let i = 0; i < 10000; i++) {
  try {
    parseJson(invalidJsonString)
  } catch {}
}
console.timeEnd('own: invalid')

console.time('native: jsonc')
for (let i = 0; i < 10000; i++) {
  try {
    JSON.parse(jsoncString)
  } catch {}
}
console.timeEnd('native: jsonc')

console.time('json-parse-helpfulerror: jsonc')
for (let i = 0; i < 10000; i++) {
  try {
    jph.parse(jsoncString)
  } catch {}
}
console.timeEnd('json-parse-helpfulerror: jsonc')

console.time('jsonc-parser: jsonc')
for (let i = 0; i < 10000; i++) {
  try {
    parse(jsoncString)
  } catch {}
}
console.timeEnd('jsonc-parser: jsonc')

console.time('own: jsonc')
for (let i = 0; i < 10000; i++) {
  try {
    parse(jsoncString)
  } catch {}
}
console.timeEnd('own: jsonc')

try {
  jph.parse(invalidJsonString)
} catch (error) {
  console.error('jph:', error.message)
}

try {
  parseJson(invalidJsonString)
} catch (error) {
  console.error('own:', error.message)
}

And this is the result:

native: valid: 7.049ms
json-parse-helpfulerror: valid: 7.318ms
jsonc-parser: valid: 36.635ms
own: valid: 15.935ms
native: invalid: 75.443ms
json-parse-helpfulerror: invalid: 346.464ms
jsonc-parser: invalid: 22.127ms
own: invalid: 160.926ms
native: jsonc: 85.061ms
json-parse-helpfulerror: jsonc: 356.101ms
jsonc-parser: jsonc: 20.922ms
own: jsonc: 15.676ms
jph: Unexpected end of input at 1:48
{"name": "John", "age": 30, "city": "New York"
                                              ^
own: Error at line 1, column 47: CloseBraceExpected
{"name": "John", "age": 30, "city": "New York"
                                              ^
  • jsonc is twice as slow than native on valid
  • jsonc is more than twice as fast on invalid compared to jph
  • jsonc is about 24x faster when accidentally using jsonc content than jph
  • jsonc has a customizable error print api, allowing to output colorized or multiple errors at once while even showing error codes.

I think it's the best of both worlds and using jsonc-parser would greatly increase the robustness of this tool.

@Torathion
Copy link
Contributor Author

I've added the new parseJson utility and added some extra tests to actually check for these helpful errors. Notice, that this utility is only in runLocal.ts. Any other method reading and parsing json files still use the native version, meaning there won't be any helpful errors nor jsonc support.

Copy link
Owner

@raineorshine raineorshine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional testing and information, that's great. However, I think you need to parse more realistic JSON for our use case, such as npm-check-updates' own package.json. That will give us a better sense of the impact of the parse time difference.

package.json Outdated
@@ -35,7 +35,7 @@
"prettier": "prettier . --check",
"test": "npm run test:unit && npm run test:e2e",
"test:bun": "test/bun-install.sh && mocha test/bun",
"test:unit": "mocha test test/package-managers/*",
"test:unit": "mocha test test/utils test/package-managers/*",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the new test to one of the existing test locations? I would like to avoid having to add a new path here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original test folder is super cluttered.. I could move some internal utils to this folder. This project is getting harder and harder to read with such a flat file tree. When I tried to bug fix something to get tests to work, I spent more time trying to find the correct file in the test folder.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't think that a deeper folder structure necessarily makes the tests better organized. If you have a particular naming convention in mind that we can apply systematically I am open to discussion.

Regardless, this should be proposed in a new issue and is out of scope for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I try to think about it. Maybe it's time to overhaul the test suites all together and switch to something faster like vitest.

pkgFile: pkgFile || undefined,
pkgFile: pkgFile ?? undefined,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty string is not a valid pkgFile, so I think || is actually correct here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I only changed this, because my SonarCube extension for vscode marked this as unsafe.

parseJson(string2).should.deep.equal({ a: 'b', c: ['d', 'e', 'f'] })
})

it('shows descriptive and helpful error messages', () => {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

const errors: ParseError[] = []
const json = parse(jsonString, errors)

if (errors.length > 0) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give me some more background on this logic and how it compares to jju? I can see that it exposes the line number and column number of errors, but I'm not clear how closely this matches the existing helpful messages.

Copy link
Contributor Author

@Torathion Torathion Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jju is both unmaintained and slower than jsonc-parser. jph does nothing with jju. This is the entire code of this this package. It's just a tiny wrapper function. So yes, another benefit of jsonc-parser as it's much more informative. You can compare both messages at the end of the output of the test file.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, but I have to inquire a little further, if you don't mind. That link is the entire code for json-parse-helpfulerror, which I did not ask for. Also, the question was not about other reasons that jsonc-parser might be better, just about the way that error messages are displayed. Given that json-parse-helpfulerror was originally added to npm-check-updates in order to provide more helpful parsing error messages, I think it is reasonable for me to ask how the proposed change compares on this specific dimension.

So to repeat my question, how does the custom logic that you added compare with the current functionality in npm-check-updates? Are the helpful error messages similar? If so, I'm happy to make the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You see the output in the test file. jsonc-parser does not handle errors on its own. If you pass it an invalid json string, it just returns. It only gives you tools to handle and format errors by only providing error metadata, while jju formats it's own errors. This is how jju handles them. The main differnces in jju:

  • No support for multiple errors
  • No real indication for the type of error
  • No options to externally style or format them

jsonc-parser also gives you the option to batch error messages by parsing an errors array. Extra errors will be pushed onto this array and can be emitted at once.

If this still doesn't answer your question, I have no idea what you intent with this question.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does answer my question :).

Comment on lines 267 to 264
pkgFile: pkgFile || undefined,
pkgFile: pkgFile ?? undefined,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll change this too.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Torathion
Copy link
Contributor Author

Torathion commented Jan 26, 2025

Ok, I've tested it again and found out that jsonc-parser has a stripComments function that is much faster than strip-json-commens. I've tested it with the ncu package.json with an extra comment as test content. This is the test file:

import jph from 'json-parse-helpfulerror'
import { parse, stripComments } from 'jsonc-parser'
import { readFile } from 'node:fs/promises'
import { join } from 'node:path'

const content = await readFile(join(process.cwd(), 'package.json'), 'utf-8')

export default function parseJson(jsonString) {
  try {
    return JSON.parse(jsonString)
  } catch {
    const errors = []
    const json = parse(jsonString, errors)

    if (errors.length > 0) {
      let errorString = ''
      const lines = jsonString.split('\n')
      for (const error of errors) {
        const offset = error.offset
        let lineNumber = 1
        let columnNumber = 1
        let currentOffset = 0
        // Calculate line and column from the offset
        for (const line of lines) {
          if (currentOffset + line.length >= offset) {
            columnNumber = offset - currentOffset + 1
            break
          }
          currentOffset += line.length + 1 // +1 for the newline character
          lineNumber++
        }
        // @ts-expect-error due to --isolatedModules forbidding to implement ambient constant enums.
        errorString += `Error at line ${lineNumber}, column ${columnNumber}: ${ParseErrorCode[error.error]}\n${lines[lineNumber - 1]}\n${' '.repeat(columnNumber - 1)}^\n\n`
      }

      throw new SyntaxError(errorString)
    }
    return json
  }
}

console.time('native')
for (let i = 0; i < 1000; i++) {
  try {
    JSON.parse(stripComments(content))
  } catch {}
}
console.timeEnd('native')

console.time('jph')
for (let i = 0; i < 1000; i++) {
  try {
    jph.parse(stripComments(content))
  } catch {}
}
console.timeEnd('jph')

console.time('jsonc')
for (let i = 0; i < 1000; i++) {
  try {
    parseJson(stripComments(content))
  } catch {}
}
console.timeEnd('jsonc')

And the output:

native: 71.708ms
jph: 63.475ms
jsonc: 62.953ms

So, the best course of action would be to always use stripComments from json-parser. Checking for .jsonc wouldn't make much sense as comments can still exist in normal .json files.

If I would to malform the package.json file, this would be the performance:

native: 97.454ms
jph: 322.203ms
jsonc: 215.128ms

@raineorshine
Copy link
Owner

Great, I like that approach. Best of both worlds. Thanks!

@Torathion
Copy link
Contributor Author

Okay, I had a little idea how to make the errors even more helpful: By displaying a code snippet.

Given a json file like:

{
    "test": {
         "a": test
    }
}

With test being an invalid symbol, my little utility should now print this:

Error at line 3, column 16: InvalidSymbol
{
      "test": {
          "a": test
               ^
        }
  }

Error at line 4, column 9: ValueExpected
      "test": {
          "a": test
        }
        ^
  }

@Torathion
Copy link
Contributor Author

Torathion commented Jan 26, 2025

I also found a little optimization in runLocal. Instead of using duplicate code, I made another function getOptionsPerPage. I also needed to type the groups, but they were never given a real type, so I picked the literal type from versions.ts and called it DependencyGroup.

Copy link
Owner

@raineorshine raineorshine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I like the new implementation! Just a small question.

*/
function ensureLineDisplay(line: string): string {
if (!line.length) return '<empty>\n'
return line.length > stdoutColumns ? '<line too long to display>\n' : `${line}\n`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about displaying as many characters as the terminal will allow?

return `${line.slice(0, stdoutColumns}\n`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do that, but it would still produce garbage data, if we are talking about a high marker position.

Speaking of that, the marker line would be problematic as well. I guess it could be done, but it needs more work. I do plan to extract this utility into it's own npm package and keep work on it in the future. But for now, I'll hide it.


/**
* Builds a json code snippet to mark and contextualize the found error.
* This snippet consists off 5 lines and the erroneous line is always in the middle.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"consists of"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just me typing faster than I can think. I'll take care of that.

@Torathion
Copy link
Contributor Author

Torathion commented Jan 26, 2025

Okay, I've changed both. First cutting off large lines instead of hiding them and second, hiding the cursor, if the line is too large. Of course, this is only a temporary thing. I need to fully conceptualize an indentation window to fully show the context, but this should be enough for this issue, I hope.

For example, I tested an inlined json string and setting the column size to 40:

SyntaxError: Error at line 1, column 98: InvalidSymbol
{ "Lorem": "ipsum dolor sit amet, conset

Error at line 1, column 103: ValueExpected
{ "Lorem": "ipsum dolor sit amet, conset

Sadly, I couldn't add a test for that, because mocha didn't let me change process.stdout.columns

@raineorshine
Copy link
Owner

Great, thanks!

@raineorshine raineorshine merged commit 35300f0 into raineorshine:main Jan 26, 2025
7 checks passed
@Torathion Torathion deleted the replace-jph branch January 26, 2025 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants