Skip to content

Commit

Permalink
Typo and small additions to tests
Browse files Browse the repository at this point in the history
  • Loading branch information
xarantolus committed Feb 26, 2021
1 parent a538024 commit f01847b
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# jsonextract
`jsonextract` is a Go library for extracting JSON and JavaScript objects from any source. It can be used for data extraction tasks like web scraping.

If any text looks like a JavaScript object or is close looking like JSON, it will be converted to it.
If any text looks like a JavaScript object or is close to looking like JSON, it will be converted to it.

### Extractor program
There's a small extractor program that uses this library to get data from URLs and files.
Expand Down Expand Up @@ -86,7 +86,7 @@ results in
### Notes
* While the functions take an `io.Reader` and stream data from it without buffering everything in memory, the underlying JS lexer uses `ioutil.ReadAll`. That means that this doesn't work well on files that are larger than memory.
* When extracting objects from JavaScript files using [`Reader`](https://pkg.go.dev/github.com/xarantolus/jsonextract#Reader), you can end up with many arrays that look like `[0]`, `[1]`, `["i"]`, which is a result of indices being used in the script. You have to filter these out yourself.
* While this package supports most number formats, there are some that don't work because the lexer doesn't support them. One of those is underscores in numbers. An example is that in JavaScript `2175` can be written as `2_175` or `0x8_7_f`, but that doesn't work here (HEX number do however). Another example are numbers with a leading zero; they are rejected by the lexer because it's not clear if they should be interpreted as octal or decimal.
* While this package supports most number formats, there are some that don't work because the lexer doesn't support them. One of those is underscores in numbers. An example is that in JavaScript `2175` can be written as `2_175` or `0x8_7_f`, but that doesn't work here (normal HEX numbers do however). Another example are numbers with a leading zero; they are rejected by the lexer because it's not clear if they should be interpreted as octal or decimal.
* Another example of unsupported number types are the float values `Inf`, `+Inf`, `-Inf` and other infinity values. While `NaN` is converted to `null` (as `NaN` is not valid JSON), infinity values don't have an appropriate JSON representation

### Changelog
Expand Down
28 changes: 28 additions & 0 deletions internal/fuzz/generate_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -503,4 +503,32 @@ var fuzzData = []struct {
[]byte(`{"test":3}`),
},
},
{
`{"value":25,"another":"test","quoted":{"is this even valid in JS?":75},"nextkey":"this\ntemplate literal\n\nspans\n\nmany \n\n\nlines"}`,
nil,
},
{
`{"subkey":"value"}`,
nil,
},
{
`{"subkey":"value"}`,
nil,
},
{
`{"@context":"https://schema.org","@type":"Product","aggregateRating":{"@type":"AggregateRating","ratingValue":"3.5","reviewCount":"11"},"description":"jsonextract is a Go library","name":"jsonextract","image":"microwave.jpg","offers":{"@type":"Offer","availability":"https://schema.org/InStock","price":"00.00","priceCurrency":"USD"},"review":[{"@type":"Review","author":"Ellie","datePublished":"2012-09-06","reviewBody":"I'm still not sure if this works.","name":"Test","reviewRating":{"@type":"Rating","bestRating":"5","ratingValue":"1","worstRating":"1"}},{"@type":"Review","author":"Lucas","datePublished":"2014-02-21","reviewBody":"Great microwave for the price.","name":"Value purchase","reviewRating":{"@type":"Rating","bestRating":"5","ratingValue":"4","worstRating":"1"}}]}`,
nil,
},
{
`{}`,
nil,
},
{
`[]`,
nil,
},
{
"[\" this is a template string. \",\"in JS you can escape` the quote character `\"]",
nil,
},
}
1 change: 1 addition & 0 deletions objects.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ func Objects(r io.Reader, o []ObjectOption) (err error) {
if opt.match(m) {
// If an object matched, we no longer care about its child elements
return opt.Callback(b)
// TODO: Go deeper if a certain error was returned by Callback
}
}

Expand Down

0 comments on commit f01847b

Please sign in to comment.