This is a fast C# library to validate UTF-8 strings.
We seek to speed up the Utf8Utility.GetPointerToFirstInvalidByte
function. Using the algorithm used by Node.js, Oracle GraalVM and other important systems.
- John Keiser, Daniel Lemire, Validating UTF-8 In Less Than One Instruction Per Byte, Software: Practice and Experience 51 (5), 2021
The algorithm in question is part of popular JavaScript runtimes such as Node.js and Bun, by PHP, by Oracle GraalVM and many important systems.
The function is private in the Microsoft Runtime, but we can expose it manually.
We recommend you install .NET 8: https://dotnet.microsoft.com/en-us/download/dotnet/8.0
dotnet test
To see which tests are running, we recommend setting the verbosity level:
dotnet test -v=normal
More details could be useful:
dotnet test -v d
To get a list of available tests, enter the command:
dotnet test --list-tests
To run specific tests, it is helpful to use the filter parameter:
dotnet test --filter TooShortErrorAvx2
Or to target specific categories:
dotnet test --filter "Category=scalar"
To run the benchmarks, run the following command:
cd benchmark
dotnet run -c Release
To run just one benchmark, use a filter:
cd benchmark
dotnet run --configuration Release --filter "*Arabic-Lipsum*"
If you are under macOS or Linux, you may want to run the benchmarks in privileged mode:
cd benchmark
sudo dotnet run -c Release
cd src
dotnet build
We recommend you use dotnet format
. E.g.,
dotnet format
You can print the content of a vector register like so:
public static void ToString(Vector256<byte> v)
{
Span<byte> b = stackalloc byte[32];
v.CopyTo(b);
Console.WriteLine(Convert.ToHexString(b));
}
public static void ToString(Vector128<byte> v)
{
Span<byte> b = stackalloc byte[16];
v.CopyTo(b);
Console.WriteLine(Convert.ToHexString(b));
}
- Be careful:
Vector128.Shuffle
is not the same asSsse3.Shuffle
nor isVector128.Shuffle
the same asAvx2.Shuffle
. Prefer the latter. - Similarly
Vector128.Shuffle
is not the same asAdvSimd.Arm64.VectorTableLookup
, use the latter.
- https://github.com/dotnet/coreclr/pull/21948/files#diff-2a22774bd6bff8e217ecbb3a41afad033ce0ca0f33645e9d8f5bdf7c9e3ac248
- dotnet/runtime#41699
- https://learn.microsoft.com/en-us/dotnet/standard/design-guidelines/
- https://learn.microsoft.com/en-us/dotnet/csharp/fundamentals/coding-style/coding-conventions