Tags: MrUnbelievable92/MaxMath
Tags
V2.2.0 ### Known Issues - `half8` `==` and `!=` operators don't conform to the IEEE 754 standard - Unity has not yet reacted to my bug-report in regards to their "half" implementation - `(s)byte`, `(u)short` vector and `(U)Int128` multiplication, division and modulo operations by compile time constants are not optimal. For (U)Int128, it requires a new Burst feature à la `T Constant.ForceCompileTimeEvaluation<T, U>(Func<T, U> code)`(proposed); Currently work is being done on `(s)byte` and `(u)short` vectors in this regard, which will beat any compiler. The current (tested)state of all optimizations possible is currently included. - `pow` functions with compile time constant exponents currently do not handle many decimal numbers - `math.rsqrt` would often be used in those cases for optimal performance but it is actually slower when the `Unity.Burst.FloatMode` is set to anything but `FloatMode.Fast`. To guarantee optimal performance, compile time access to the current `FloatMode` would be needed (proposed) - double `(r)cbrt` and thus possibly (u)int `intcbrt` functions are currently not optimized ### Fixes - linked `float8` `rcp` and `rsqrt` functions to Bursts' `FloatMode` and `FloatPrecision` - `short.MinValue / -1` now correctly overflows to `short.MinValue` when dividing a `short16` vector by another `short16` vector when compiling for AVX or higher - fixed scalar `quarter` to `double` conversion for when the `quarter` value is negative - fixed scalar `half` to `quarter` conversion for when the `half` value is negative - fixed vector `quarter` to `ulong` conversion for when a `quarter` value is negative - fixed `(u)short8` to `quarter8` conversion ### Additions # Added saturation arithmetic to the library for all scalar- and vector types. Saturation arithmetic clamps the result of an operation to `type.MinValue` and `type.MaxValue` if under- or overflow occurs, respectively and has single-instruction hardware support for `(s)bytes` and `(u)shorts`. The included functions are: - `addsaturated` - `subsaturated` - `mulsaturated` - `divsaturated` (only clamps division of floating point types and signed division of, for instance, `sbyte.MinValue` ( = -128) `/ -1 to 127`, which would cause a hardware exception for `int`s and `longs`) - `castsaturated` (all types to all other types with a smaller range), - `csumsaturated` - `cprodsaturated` - added high performance `(U)Int128` types with full library support, meaning: all operators and type conversions aswell as all functions support these types. Most operations of both types, in Burst code, compile down to optimal machine code. Exceptions: 1) signed 64x64 bit to 128 bit multiplication 2) `*`, `/`, `%` and `divrem` functions with a scalar compile time constant argument (See: Known Issues #2) - added `Random128` XOR-Shift pseudo random number generator for generating `(U)Int128`s - added high performance & accuracy `(r)cbrt` - (reciprocal) cube root functions for scalar and vector `float`- and `double` types based on a research paper from 2021. An optional `bool` parameter allows the caller to decide whether or not negative input values should be handled correctly (which is not the case with `math.pow(x, 1f/3f)`), which is set to `false` by default - added high performance `intcbrt` - integer cube root functions for all scalar and vector integer types. For signed integer types, an optional `bool` parameter allows the caller to decide whether or not negative input values should be handled correctly (which is not the case with `math.pow(x, 1f/3f)`), which is set to `false` by default - added a `log` function to all scalar and vector `float`- and `double` types with a second parameter `b`, which is the logarithms' base - added `reversebytes` functions for all scalar- and vector types, which convert back and forth between big endian and little endian byte order, respectively. All of them (scalar, vector) compile down to single hardware instructions - added `pow` functions with scalar exponents for `float` and `double` scalars and vectors, with optimizations for selected constant exponents (not necessarily whole exponents) - added function overloads to all functions for scalar `(s)byte`s and `(u)short`s in order to resolve function call resolution ambiguity which was already present in `Unity.Mathematics`, which may also improve performance in some cases - added a static readonly `New` property to `RandomX` XOR-Shift pseudo random generators. It calls `Environment.TickCount` internally (and is thus seeded somewhat randomly), makes sure it is non-zero and can be called from Burst native code - added `fastrcp` functions for `float` scalars and vectors, faster (and substantially less accurate) than `FloatPrecision.Low`, `FloatMode.Fast` Burst implementations - added `fastrsqrt` functions for `float` scalars and vectors, faster (and substantially less accurate) than `FloatPrecision.Low`, `FloatMode.Fast` Burst implementations ### Improvements - added AVX and AVX2 code for `float8` `sin`, `cos`, `tan`, `sincos`, `asin`, `acos`, `atan`, `atan2`, `sinh`, `cosh`, `tanh`, `pow`, `exp`, `exp2`, `exp10`, `log`, `log2`, `log10` and `fmod` (and the "%" operator) - optimized many `/`, `%`, `*` and `divrem` operations with a scalar compile time constant argument for (s)byte vectors (see 'Known Issues #2'), which were previously not optimized (...optimally/at all) by Burst. - added SSE2 fallback code for converting AVX vector types to SSE vector types and vice versa(for example: `short16`(256 bit) to `byte16`(128 bit)) - scalar `(s)byte` and `(u)short` `rol` and `ror` functions now compile down to single hardware instructions - improved performance and/or reduced code size of nearly all vector comparison operations - improved performance of - and added SSE2 fallback code for bitfield to boolean vector conversion (`toboolX` and thus also `select(vector a, vector b, bitmask c)`); - improved performance of `intpow` functions in general and for when the exponent is a compile time constant - improved performance and reduced code size of `compareto` vector functions (especially for unsigned types) - added more optimizations to `isdivisible` - improved performance of `intsqrt` functions for `(u)long` and `(s)byte` scalar and vector types considerably - reduced code size of `ispow2` vector functions - reduced code size of (s)byte vector-by-vector division - improved performance of `Random64`'s `(u)long4` generation if compiling for AVX2 - improved performance of `(s)byte` matrix multiplication - reduced code size of `(u)short`- and up to `(s)byte8` vector by vector division and `divrem` functions(and improved performance if compiling for SSE2) - reduced code size and improved performance of `isinrange` functions for `(u)long` vector types - reduced code size of ushort vector `>=` and `<=` operators for SSE2 fallback code by ~75% - improved performance and reduced code size of SSE2 down-casting fallback code ### Changes - API BREAKING CHANGE: The various bool to integer/floating point conversion functions (`touint8`/`tof32` etc.) are now renamed to contain C# types in their names (`tobyte`/`tofloat` etc.) - API BREAKING CHANGE: If you use this library as intended, meaning you import it and `Unity.Mathematics.math` statically (`using static MaxMath.maxmath;`) and you use the `pow` functions with scalar bases and scalar exponents in those scripts, you will encounter the first ever function call resolution ambiguity. It is strongly recommended to always use the `maxmath.pow` function, because it optimizes any `pow` call enormously if the exponent is a compile time constant, which does NOT necessarily mean that such a call must declare the exponent as a literal value - the exponent may become a compile time constant due to constant propagation - `quarter` is now a readonly struct - `quarter` to `sbyte`, `short`, `int` and `long` coversions are now required to be declared explicitly - removed `countbits(void* ptr, ulong bytes)` from the library and added it to https://github.com/MrUnbelievable92/SIMD-Algorithms with more options ### Fixed Oversights - (Issue #3) added constructor wrappers to the maxmath class analogous to `Unity.Mathematics`(`byte4 myByte4 = (maxmath.)byte4(1, 2, 3, 4);`) - added `dsub` - fused divide-subtract function for scalar and vector `float` types - added an optional `bool fast = false` parameter to `dad`, `dsub`, `dadsub` and `dsubadd` functions - added `andnot` function overloads for scalar and vector `bool` types - added implicit type conversions of scalar `quarter` values to `half`, `float` and `double` vectors - added `all_eq` and `all_dif` functions for vectors of size 2 - added `all_eq` and `all_dif` functions for `float` and `double` vectors
v2.1.2 ### Known Issues - half8 "equals" and "not equals" operators don't conform to the IEEE 754 standard - Unity has not yet reacted to my bug-report in regards to their "half" implementation ### Fixes - fixed undefined behavior of "vshr" functions for vector types smaller than 128 bits - fixed SSE2 implementations of "vrol" and "vror" functions for the (u)short16 type ### Additions - implemented Bmi1 and Bmi2 intrinsics as functions with a "bits_" prefix (except for "andn", which has already been implemented as "andnot") - added high performance and/or SIMD "isdivisible" functions for all integer vector types and scalar value types - added high performance and/or SIMD "intpow" - integer exponantiation - functions for (u)int, (u)long and all integer vector types - added high performance and/or SIMD "floorpow2" functions for all integer vector types - added "nabs" - negative absolute value functions for all non-boolean vector- and single value types - added "indexof(vector v, value x)" functions for all non-boolean vector types ### Improvements - aggressivley optimized away global variables (shuffle masks) and thus memory access and usage where appropriate - improved performance of 256 bit vector subvector getters - added Sse2 fallback code for all (u)long2/3/4 operators - improved performance of mulitplication, division and modulo operations for all (s)byte- and (u)short vector- and matrix types when dividing by a single non-compile time constant value - added overloads for (s)byte- and (u)short vectors' "divrem" functions with a scalar value as the divisor parameter, improving performance when it is a compile time constant - improved performance of "intsqrt" functions for most types ### Changes - bump com.unity.burst to version 1.5 ### Fixed Oversights - added bitmask8 and bitmask16 functions for (s)byte and (u)short vector types, respectively
Hotfix ### Known Issues Known Issues half8 "equals" and "not equals" operators don't conform to the IEEE 754 standard - Unity has not yet reacted to my bug-report in regards to their "half" implementation Fixes fixed triggered burst compilation error by "Sse4_1.blend_epi16" when compiling for SSE2 due to fallback code not using a constant value for "imm8" fixed incorrect CPU feature checks for quarter vector type-conversion code when compiling for SSE2 fixed "tzcnt" implementations (were completely broken) fixed scalar (single value and C# fallback) "lzcnt" implementations for (s)byte and (u)short values and (u)long4 vectors Additions added "ulong countbits(void* ptr, ulong bytes)", which counts the number of 1-bits in a given block of memory, using Wojciech Mula's SIMD population count algorithm added high performance and/or SIMD "gcd" a.k.a. greatest common divisor functions for (u)int, (u)long and all integer vector types, which always return unsigned types and vectors added high performance and/or SIMD "lcm" a.k.a. least common multiple functions for (u)int, (u)long and all integer vector types, which always return unsigned types and vectors added high performance and/or SIMD "intsqrt" - integer square root (floor(sqrt(x)) functions for all integer- and integer vector types, with the functions for signed integers and vectors throwing an ArgumentOutOfRangeException in case a value is negative Improvements performance improvements of "avg" functions for signed integer vectors added SIMD implementations of the "transpose" functions for all matrix types added SSE4 and SSE2 fallback code for variable bitshifts ("shl", "shrl" and "shra") added SSE2 fallback code for (s)byte vector-by-vector division and modulo operations added SSE2 fallback code for "all_dif" for (s)byte16, (u)short8 and (u)int8 vectors added SSE2 fallback code for typecasting, propagating through the entire library added SSE2 fallback code for "addsub" and "subadd" functions bitmask32 and bitmask64 now allow for masks to be up to 32 and 64 bits wide, respectively Changes renamed "BurstCompilerException" to "CPUFeatureCheckException" "shl", "shrl" and "shra" now have undefined behavior when bitshifting any value outside of the interval [0, 8 * sizeof(integer_type) - 1] for performance reasons and because of differences between SSE, AVX and managed C# Fixed Oversights added "shl", "shrl" and "shra" (varying per element) functions for (s)byte and (u)short vectors added "ror" and "rol" (varying per element) functions for (s)byte and (u)short vectors added "compareto" functions for all vector types except half- and quarter vectors added "all_dif" functions for (s)byte32 vectors added vshr/l and vror/l functions for (s)byte32 and (u)short16 vectors # Version 2.1.1 ### Fixes - fixed SSE2 "shl", "shrl" and "shra" implementations - fixed SSE2 "intsqrt" implementations ### Improvements - improved performance of (s)byte2, -3, -4, -8, -16 and (u)short2, -3, -4, -8 "gcd" functions (and thus "lcm") when compiling for Avx2 - improved performance of "tzcnt" and "lzcnt" implementations for all vector types if compiling for SSE4 or higher, propagating through a lot of the library ### Fixed Oversights - Added documentation for RandomX methods
Release 2.1.0 ### Known Issues - half8 "equals" and "not equals" operators don't conform to the IEEE 754 standard - Unity has not yet reacted to my bug-report in regards to their "half" implementation ### Fixes - fixed triggered burst compilation error by "Sse4_1.blend_epi16" when compiling for SSE2 due to fallback code not using a constant value for "imm8" - fixed incorrect CPU feature checks for quarter vector type-conversion code when compiling for SSE2 - fixed "tzcnt" implementations (were completely broken) - fixed scalar (single value and C# fallback) "lzcnt" implementations for (s)byte and (u)short values and (u)long4 vectors ### Additions - added "ulong countbits(void* ptr, ulong bytes)", which counts the number of 1-bits in a given block of memory, using Wojciech Mula's SIMD population count algorithm - added high performance and/or SIMD "gcd" a.k.a. greatest common divisor functions for (u)int, (u)long and all integer vector types, which always return unsgined types and vectors - added high performance and/or SIMD "lcm" a.k.a. least common multiple functions for (u)int, (u)long and all integer vector types, which always return unsgined types and vectors - added high performance and/or SIMD "intsqrt" - integer square root (floor(sqrt(x)) functions for all integer- and integer vector types, with the functions for signed integers throwing an ArgumentOutOfRangeException in case a value is negative ### Improvements - performance improvements of "avg" functions for signed integer vectors - added SIMD implementations of the "transpose" functions for all matrix types - added SSE4 and SSE2 fallback code for variable bitshifts ("shl", "shrl" and "shra") - added SSE2 fallback code for (s)byte vector-by-vector division and modulo operations - added SSE2 fallback code for "all_dif" for (s)byte16, (u)short8 and (u)int8 vectors - added SSE2 fallback code for typecasting, propagating through the entire library - added SSE2 fallback code for "addsub" and "subadd" functions - bitmask32 and bitmask64 now allow for masks to be up to 32 and 64 bits wide, respectively ### Changes - renamed "BurstCompilerException" to "CPUFeatureCheckException" - "shl", "shrl" and "shra" now have undefined behavior when bitshifting any value beyond [0, 8 * sizeof(integer_type)] for performance reasons and because of differences between SSE, AVX and managed C# ### Fixed Oversights - added "shl", "shrl" and "shra" functions for (s)byte and (u)short vectors - added "ror" and "rol" (varying per element) functions for (s)byte and (u)short vectors - added "compareto" functions for all vector types except half- and quarter vectors - added "all_dif" functions for (s)byte32 vectors - added vshr/l and vror/l functions for (s)byte32 and (u)short16 vectors
Re-Release ### Re-Release Notes - Version 2.0.0 adds - for the first time - fallback procedures from Avx2 to Sse4, Sse2 and platform independent instruction sets, respectively, with some major optimizations for all of them - ARM and other instruction sets do _NOT_ have fallback procedures written for them, and there are no plans for it at this time. Burst/LLVM are good at recognizing the patterns in the code, though, and some of the code will be vectorized on other platforms (confirmed) ### Known Issues - half8 "equals" and "not equals" operators don't conform to the IEEE 754 standard - Unity has not yet reacted to my bug-report in regards to their "half" implementation ### Fixes - fixed incorrect bool4 subvector getters of the bool8 type ### Improvements - removed "fixed" vector element access to improve performance in managed C# ### Additions - added "shuffle(vector, vector, ShuffleComponent(, ShuffleComponent)(, ShuffleComponent)(, ShuffleComponent)) functions for (s)byte, (u)short, (u)long, quarter and half vectors ### Changes - Bump com.unity.burst to version 1.4.4 ### Fixed Oversights - Added "addsub" funtion for floating point types, complementary to "subadd" - Added "addsub" and "subadd" functions for integer types
1.2.0 Release # Known Issues - half8 "equals" and "not equals" operators don't conform to the IEEE 754 standard - Unity has not yet reacted to my bug-report in regards to their "half" implementation. # Fixes - Added preliminary safety cast to a float of the half value in toboolsafe() until Unity fixes their half '==' and '!=' operators according to IEEE 754 # Additions ### "quarter" precision floats and vectors - "quarter" is an 8-bit IEEE 754 1.3.4.-3 floating point value, often called a "minifloat" - It has a very limited range of [-15.5, 15.5] with an epsilon of 0.015625. All integers, aswell as i + 0.5, within that range can be represented as a quarter - Type conversion from - and to quarters also conforms to the IEEE 754 standard. In detail, casting to a quarter performs rounding according to a) its' precision and b) whether or not the more precise value is closer to 0 or to quarter.Epsilon. NaN and +/- zero preservation, aswell as preservation/clamping to +/- infintiy was also implemented - "==" and "!=" operators for vectors conforming to the IEEE 754 standard were implemented (unlike, currently, Unity's "half" type). All the other boolean- and arithmetic operators were implemented for the base type only, which will return single precision results (for arithmetic operations). For vectors, quarter vectors are to be (implicitly) cast to single precision vectors first, until/if Unity chnages their "half" implementation. - Type conversions from - and to all other single value and vector types were implemented - Full function implementation within the library was added, including: abs(), isnan(), isinf(), isfinite(), select(), as[s]byte/asquarter(), vrol/r(), vshl/r(), tobool[safe]() and toquarter[safe]() ### Fixed Oversights - Added missing type conversions from - and to half8 for (s)byte8, (u)short8 and (u)int8 vectors - Added missing type conversions from - and to half8 for booleans and boolean vectors - Added half "select" functions - Improved the performance of unsafe boolean-to-half/float/double functions - added (preliminary?) "abs", "isnan", "isinf" and "isfinite" for half and half vectors, eliminating unnecessary casting
PreviousNext