Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JsonNode throws exception too early on invalid JSON #111186

Closed
RenderMichael opened this issue Jan 8, 2025 · 2 comments
Closed

JsonNode throws exception too early on invalid JSON #111186

RenderMichael opened this issue Jan 8, 2025 · 2 comments

Comments

@RenderMichael
Copy link
Contributor

Description

Much ado has been made about Newtonsoft vs. System.Text.Json, most notably the latter's strictness when it comes to invalid input.

System.Text.Json's adherence to the spec and to best JSON practices is admirable. However, there are times when we must process JSON which does not conform to the spec, especially when the JSON input comes from third-party integrations or is otherwise outside of our control. System.Text.Json has offered up configuration options which let us deviate from the spec, such as allowing trailing commas or escaping a different set of characters. The final option a user has is to write their own custom converter.

However, when JsonNode is populated with invalid UTF-16 surrogate pairs, the exception it throws is far too early to handle with a custom converter. When calling Deserialize<T>, an exception is thrown before custom converters can be run. object.ToString() throws an exception as well, which is a major violation of the convention for that method.

Notably, JsonElement does not suffer from either of these drawbacks.

Reproduction Steps

using System;
using System.Text;
using System.Text.Json;
using System.Text.Json.Nodes;
using System.Text.Json.Serialization;

JsonElement badJsonElement = JsonDocument.Parse("""{"HasBadData":"\uDB40"}""").RootElement;
JsonNode badJsonNode = JsonNode.Parse("""{"HasBadData":"\uDB40"}""")!;

var tolerateBadJsonOptions = new JsonSerializerOptions
{
    Converters = 
    {
        new InvalidUtf16Converter()
    }
};

Console.WriteLine(badJsonElement.ToString()); // Prints {"HasBadData":"\uDB40"}
Console.WriteLine(badJsonNode.ToString()); // Throws exception

Console.WriteLine(badJsonElement.Deserialize<MyModel>(tolerateBadJsonOptions)); // Prints MyModel { HasBadData = \uDB40 }
Console.WriteLine(badJsonNode.Deserialize<MyModel>(tolerateBadJsonOptions)); // Throws exception

record MyModel
{
    public string? HasBadData { get; set; }
}

sealed class InvalidUtf16Converter : JsonConverter<string>
{
    public override string? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
    {
        try
        {
            return reader.GetString();
        }
        catch (InvalidOperationException)
        {
             var bytes = reader.ValueSpan;
            var sb = new StringBuilder(bytes.Length);
            foreach (var b in bytes)
                sb.Append(Convert.ToChar(b));
            return sb.ToString();
        }
    }

    public override void Write(Utf8JsonWriter writer, string value, JsonSerializerOptions options)
    {
        writer.WriteStringValue(value);
    }
}

Expected behavior

JsonNode.ToString() should not throw, and ideally returns the same result as JsonElement.ToString()

JsonNode.Deserialize<T>(JsonSerializerOptions? options = null) ideally would run the custom converters before throwing an exception.

Actual behavior

Both JsonNode.ToString() and JsonNode.Deserialize<T>(JsonSerializerOptions? options = null) throw an exception in the code example.

Regression?

In .NET 6, JsonDocument.Parse threw an exception for the input here. I'm not sure if it's been relaxed intentionally or as an oversight, but hopefully we keep it lax and leave the exceptions to the deserialization!

https://dotnetfiddle.net/UWILZK

Known Workarounds

Use JsonElement instead, where possible.

Configuration

No response

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jan 8, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

@eiriktsarpalis
Copy link
Member

Appears to be a duplicate of #32291.

@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants