Skip to content

JsonNode throws exception too early on invalid JSON #111186

Closed
@RenderMichael

Description

@RenderMichael

Description

Much ado has been made about Newtonsoft vs. System.Text.Json, most notably the latter's strictness when it comes to invalid input.

System.Text.Json's adherence to the spec and to best JSON practices is admirable. However, there are times when we must process JSON which does not conform to the spec, especially when the JSON input comes from third-party integrations or is otherwise outside of our control. System.Text.Json has offered up configuration options which let us deviate from the spec, such as allowing trailing commas or escaping a different set of characters. The final option a user has is to write their own custom converter.

However, when JsonNode is populated with invalid UTF-16 surrogate pairs, the exception it throws is far too early to handle with a custom converter. When calling Deserialize<T>, an exception is thrown before custom converters can be run. object.ToString() throws an exception as well, which is a major violation of the convention for that method.

Notably, JsonElement does not suffer from either of these drawbacks.

Reproduction Steps

using System;
using System.Text;
using System.Text.Json;
using System.Text.Json.Nodes;
using System.Text.Json.Serialization;

JsonElement badJsonElement = JsonDocument.Parse("""{"HasBadData":"\uDB40"}""").RootElement;
JsonNode badJsonNode = JsonNode.Parse("""{"HasBadData":"\uDB40"}""")!;

var tolerateBadJsonOptions = new JsonSerializerOptions
{
    Converters = 
    {
        new InvalidUtf16Converter()
    }
};

Console.WriteLine(badJsonElement.ToString()); // Prints {"HasBadData":"\uDB40"}
Console.WriteLine(badJsonNode.ToString()); // Throws exception

Console.WriteLine(badJsonElement.Deserialize<MyModel>(tolerateBadJsonOptions)); // Prints MyModel { HasBadData = \uDB40 }
Console.WriteLine(badJsonNode.Deserialize<MyModel>(tolerateBadJsonOptions)); // Throws exception

record MyModel
{
    public string? HasBadData { get; set; }
}

sealed class InvalidUtf16Converter : JsonConverter<string>
{
    public override string? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
    {
        try
        {
            return reader.GetString();
        }
        catch (InvalidOperationException)
        {
             var bytes = reader.ValueSpan;
            var sb = new StringBuilder(bytes.Length);
            foreach (var b in bytes)
                sb.Append(Convert.ToChar(b));
            return sb.ToString();
        }
    }

    public override void Write(Utf8JsonWriter writer, string value, JsonSerializerOptions options)
    {
        writer.WriteStringValue(value);
    }
}

Expected behavior

JsonNode.ToString() should not throw, and ideally returns the same result as JsonElement.ToString()

JsonNode.Deserialize<T>(JsonSerializerOptions? options = null) ideally would run the custom converters before throwing an exception.

Actual behavior

Both JsonNode.ToString() and JsonNode.Deserialize<T>(JsonSerializerOptions? options = null) throw an exception in the code example.

Regression?

In .NET 6, JsonDocument.Parse threw an exception for the input here. I'm not sure if it's been relaxed intentionally or as an oversight, but hopefully we keep it lax and leave the exceptions to the deserialization!

https://dotnetfiddle.net/UWILZK

Known Workarounds

Use JsonElement instead, where possible.

Configuration

No response

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions