Skip to content

simdutf/SimdUnicode

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SimdUnicode

.NET

This is a fast C# library to validate UTF-8 strings.

Motivation

We seek to speed up the Utf8Utility.GetPointerToFirstInvalidByte function. Using the algorithm used by Node.js, Oracle GraalVM and other important systems.

The algorithm in question is part of popular JavaScript runtimes such as Node.js and Bun, by PHP, by Oracle GraalVM and many important systems.

The function is private in the Microsoft Runtime, but we can expose it manually.

Requirements

We recommend you install .NET 8: https://dotnet.microsoft.com/en-us/download/dotnet/8.0

Running tests

dotnet test

To see which tests are running, we recommend setting the verbosity level:

dotnet test -v=normal

More details could be useful:

dotnet test -v d

To get a list of available tests, enter the command:

dotnet test --list-tests

To run specific tests, it is helpful to use the filter parameter:

dotnet test --filter TooShortErrorAvx2

Or to target specific categories:

dotnet test --filter "Category=scalar"

Running Benchmarks

To run the benchmarks, run the following command:

cd benchmark
dotnet run -c Release

To run just one benchmark, use a filter:

cd benchmark
dotnet run --configuration Release --filter "*Arabic-Lipsum*"

If you are under macOS or Linux, you may want to run the benchmarks in privileged mode:

cd benchmark
sudo dotnet run -c Release

Building the library

cd src
dotnet build

Code format

We recommend you use dotnet format. E.g.,

dotnet format

Programming tips

You can print the content of a vector register like so:

        public static void ToString(Vector256<byte> v)
        {
            Span<byte> b = stackalloc byte[32];
            v.CopyTo(b);
            Console.WriteLine(Convert.ToHexString(b));
        }
        public static void ToString(Vector128<byte> v)
        {
            Span<byte> b = stackalloc byte[16];
            v.CopyTo(b);
            Console.WriteLine(Convert.ToHexString(b));
        }

Performance tips

  • Be careful: Vector128.Shuffle is not the same as Ssse3.Shuffle nor is Vector128.Shuffle the same as Avx2.Shuffle. Prefer the latter.
  • Similarly Vector128.Shuffle is not the same as AdvSimd.Arm64.VectorTableLookup, use the latter.

More reading