Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom decoder #35

Closed
jeaye opened this issue Sep 29, 2020 · 4 comments · Fixed by #36
Closed

Custom decoder #35

jeaye opened this issue Sep 29, 2020 · 4 comments · Fixed by #36

Comments

@jeaye
Copy link
Contributor

jeaye commented Sep 29, 2020

I'm currently replacing transit+json with jsonista for some performance-critical areas of a Clojure server. The only hitch has been supporting a lossless round trip for keyword values. By default, with jsonista, we get this:

user=> (-> (j/write-value-as-string {:system/status :status/good} json-mapper) (j/read-value json-mapper))
#:system{:status "status/good"}

But this is all JSON used internally, so I can take a bit of liberty with using a custom encoding for keywords. For example, I've been benchmarking with this (strcat is like str, but from stringer.core):

(jsonista/object-mapper {:encode-key-fn true
                         :decode-key-fn true
                         :encoders {clojure.lang.Keyword (fn [^clojure.lang.Keyword kw
                                                              ^com.fasterxml.jackson.core.JsonGenerator gen]
                                                           (.writeRawValue gen (strcat "[\"!kw\",\"" kw "\"]")))}})

This works well for encoding, but jsonista doesn't have a way to specify custom decoders. What I'd really love is a custom decoder for JSON arrays so I can detect this custom ["!kw", "status/good"] scenario and generate the correct keyword from it.

I have tried to solve this after the fact, using specter. It's fast, but the optimized specter transformation to replace those specific vectors doubles the benchmark time for an encode/decode round trip. So I think it would be much more efficient to do this as the JSON is being decoded, rather than after.

So, is this something you guys think can work with jsonista? Support for a custom decoder to turn ["!kw", "status/good"] into :status/good?

Thanks for your time and for the awesome library.

Side note: By using jsonista + the custom encoder, criterium says my encoding speed has improved 10x, compared to transit+json. But, due to the second pass needed for specter to correct the keywords, the decoding speed has slowed by 50%, compared to transit+json. If I can get the decoding speeds back to at least as good as transit+json, this will be a huge win.

@ikitommi
Copy link
Member

I think all you need to do is to rebind the list-decoder to your own version, that peeks the first value and look for a suitable parser for the tag, else use the default decoder. See https://github.com/metosin/jsonista/blob/master/src/clj/jsonista/core.clj#L79

@ikitommi
Copy link
Member

But, if you know your data model in advance, e.g. you have a Malli definition, you could use it to compile optimized transformers without any tagging. Will try to cook a working example out of this.

@jeaye
Copy link
Contributor Author

jeaye commented Sep 29, 2020

Thanks so much for the quick and helpful reply. :)

I think all you need to do is to rebind the list-decoder to your own version, that peeks the first value and look for a suitable parser for the tag, else use the default decoder. See https://github.com/metosin/jsonista/blob/master/src/clj/jsonista/core.clj#L79

Ok, got it. I've been through that source and also the PersistentVectorDeserializer source. My only remaining question is this:

Can this be done as a module I can pass in through :modules, to overwrite the clojure-module deserializer for List, or does this need to be done through a fork?

@ikitommi
Copy link
Member

I believe you can do this using a module. If there is something needed in jsonista, please do a PR. Also, new TaggedValueOrPersistentVectorDeserializer PR welcome :)

jeaye added a commit to jeaye/jsonista that referenced this issue Sep 29, 2020
This aims to open the doors for jsonista to become a viable alternative
for people currently using EDN with transit by providing a way to encode
EDN types using a tagged JSON list. Currently, only keywords are
supported.

This brings in a new dependency, stringer.core, which is used when
encoding. It provides a much faster version of `str` and so, in my
opinion, is worthy of inclusion. Since it was brought in, I replaced all
other usages of `str` within jsonista, but there was only one.

Regarding `TaggedValueOrPersistentVectorDeserializer`, it's worth noting
that it doesn't just peek for a tag and then fall back on the default
`PersistentVectorDeserializer`, since peeking seemed to require a call
to `nextValue()` and I found no way to backtack. So it seems I need to
try building the transient and then just special-case the zeroth element
to check for the tag.

Right now, it doesn't do anything special with mapping tags to different
types in the Java code, since there's only one and I figure that change
can be made when it's needed. The Clojure API, however, remains more
general, since the tag needs to come in as a key.  This will make adding
new keys to the API trivial, if more types are supported in the future.

This closes metosin#35.
jeaye added a commit to jeaye/jsonista that referenced this issue Sep 30, 2020
This aims to open the doors for jsonista to become a viable alternative
for people currently using EDN with transit by providing a way to encode
EDN types using a tagged JSON list. Currently, only keywords are
supported.

Regarding `TaggedValueOrPersistentVectorDeserializer`, it's worth noting
that it doesn't just peek for a tag and then fall back on the default
`PersistentVectorDeserializer`, since peeking seemed to require a call
to `nextValue()` and I found no way to backtack. So it seems I need to
try building the transient and then just special-case the zeroth element
to check for the tag.

Right now, it doesn't do anything special with mapping tags to different
types in the Java code, since there's only one and I figure that change
can be made when it's needed. The Clojure API, however, remains more
general, since the tag needs to come in as a key.  This will make adding
new keys to the API trivial, if more types are supported in the future.

This closes metosin#35.
jeaye added a commit to jeaye/jsonista that referenced this issue Sep 30, 2020
This aims to open the doors for jsonista to become a viable alternative
for people currently using EDN with transit by providing a way to encode
EDN types using a tagged JSON list. Currently, only keywords are
supported.

Regarding `TaggedValueOrPersistentVectorDeserializer`, it's worth noting
that it doesn't just peek for a tag and then fall back on the default
`PersistentVectorDeserializer`, since peeking seemed to require a call
to `nextValue()` and I found no way to backtack. So it seems I need to
try building the transient and then just special-case the zeroth element
to check for the tag.

Right now, it doesn't do anything special with mapping tags to different
types in the Java code, since there's only one and I figure that change
can be made when it's needed. The Clojure API, however, remains more
general, since the tag needs to come in as a key.  This will make adding
new keys to the API trivial, if more types are supported in the future.

This closes metosin#35.
jeaye added a commit to jeaye/jsonista that referenced this issue Oct 3, 2020
This aims to open the doors for jsonista to become a viable alternative
for people currently using EDN with transit by providing a way to encode
EDN types using a tagged JSON list. Currently, only keywords are
supported.

Regarding `TaggedValueOrPersistentVectorDeserializer`, it's worth noting
that it doesn't just peek for a tag and then fall back on the default
`PersistentVectorDeserializer`, since peeking seemed to require a call
to `nextValue()` and I found no way to backtack. So it seems I need to
try building the transient and then just special-case the zeroth element
to check for the tag.

Right now, it doesn't do anything special with mapping tags to different
types in the Java code, since there's only one and I figure that change
can be made when it's needed. The Clojure API, however, remains more
general, since the tag needs to come in as a key.  This will make adding
new keys to the API trivial, if more types are supported in the future.

This closes metosin#35.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants