Description
Description
Much ado has been made about Newtonsoft
vs. System.Text.Json
, most notably the latter's strictness when it comes to invalid input.
System.Text.Json
's adherence to the spec and to best JSON practices is admirable. However, there are times when we must process JSON which does not conform to the spec, especially when the JSON input comes from third-party integrations or is otherwise outside of our control. System.Text.Json
has offered up configuration options which let us deviate from the spec, such as allowing trailing commas or escaping a different set of characters. The final option a user has is to write their own custom converter.
However, when JsonNode
is populated with invalid UTF-16 surrogate pairs, the exception it throws is far too early to handle with a custom converter. When calling Deserialize<T>
, an exception is thrown before custom converters can be run. object.ToString()
throws an exception as well, which is a major violation of the convention for that method.
Notably, JsonElement
does not suffer from either of these drawbacks.
Reproduction Steps
using System;
using System.Text;
using System.Text.Json;
using System.Text.Json.Nodes;
using System.Text.Json.Serialization;
JsonElement badJsonElement = JsonDocument.Parse("""{"HasBadData":"\uDB40"}""").RootElement;
JsonNode badJsonNode = JsonNode.Parse("""{"HasBadData":"\uDB40"}""")!;
var tolerateBadJsonOptions = new JsonSerializerOptions
{
Converters =
{
new InvalidUtf16Converter()
}
};
Console.WriteLine(badJsonElement.ToString()); // Prints {"HasBadData":"\uDB40"}
Console.WriteLine(badJsonNode.ToString()); // Throws exception
Console.WriteLine(badJsonElement.Deserialize<MyModel>(tolerateBadJsonOptions)); // Prints MyModel { HasBadData = \uDB40 }
Console.WriteLine(badJsonNode.Deserialize<MyModel>(tolerateBadJsonOptions)); // Throws exception
record MyModel
{
public string? HasBadData { get; set; }
}
sealed class InvalidUtf16Converter : JsonConverter<string>
{
public override string? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
{
try
{
return reader.GetString();
}
catch (InvalidOperationException)
{
var bytes = reader.ValueSpan;
var sb = new StringBuilder(bytes.Length);
foreach (var b in bytes)
sb.Append(Convert.ToChar(b));
return sb.ToString();
}
}
public override void Write(Utf8JsonWriter writer, string value, JsonSerializerOptions options)
{
writer.WriteStringValue(value);
}
}
Expected behavior
JsonNode.ToString()
should not throw, and ideally returns the same result as JsonElement.ToString()
JsonNode.Deserialize<T>(JsonSerializerOptions? options = null)
ideally would run the custom converters before throwing an exception.
Actual behavior
Both JsonNode.ToString()
and JsonNode.Deserialize<T>(JsonSerializerOptions? options = null)
throw an exception in the code example.
Regression?
In .NET 6, JsonDocument.Parse
threw an exception for the input here. I'm not sure if it's been relaxed intentionally or as an oversight, but hopefully we keep it lax and leave the exceptions to the deserialization!
https://dotnetfiddle.net/UWILZK
Known Workarounds
Use JsonElement
instead, where possible.
Configuration
No response
Other information
No response