Skip to content

Tags: microsoft/Accera

Tags

v1.2.29

Toggle v1.2.29's commit message
Squashed commit of the following:

commit 5ec0fc859f017654144b33cfad92bbae62391088
Author: Captain Jack Sparrow <[email protected]>
Date:   Mon Apr 17 18:37:24 2023 +0000

    Merged PR 3211: Upgrade hatlib dependency to 0.0.39

    Upgrade hatlib dependency to 0.0.39

commit 38642006cbc8c4ff01c7345d018f9a8233454dbd
Author: Mason Remy <[email protected]>
Date:   Fri Apr 14 19:27:01 2023 +0000

    Merged PR 3209: Support AffineParallelOp and scf::ParallelOp in RangeValue utils

    Support AffineParallelOp and scf::ParallelOp in RangeValue utils

commit addb45a1a4ccb50657b822591735916be83498c5
Author: Captain Jack Sparrow <[email protected]>
Date:   Wed Apr 12 17:25:02 2023 +0000

    Merged PR 3207: Fix parallelization and enable file checker in tests

    Fix parallelization and enable file checker in tests

commit 7e206532932ff603decfd46656173702ebdceff5
Author: Lisa Ong <[email protected]>
Date:   Wed Apr 12 08:02:20 2023 +0000

    Merged PR 3195: [LLVM 15] progressive upgrade (24a37a396a9b), disable macos builds

    The first of a series of progressive upgrades from LLVM 14.0.6 to LLVM 15.0.7 (and possibly beyond).

    Current LLVM version:
    https://intelligentdevices.visualstudio.com/ELL/_git/accera.llvm?version=GBaccera/llvmorg-15-24a37a396a9b&_a=history

    This is llvmorg-15.0.0-init, fast forwarded to about 100 "relevant" MLIR commits (actual number of commits is higher).

    Performance on AVX2 is verified for Windows (no regressions).

    **Breaking Change: macOS builds**
    With this upgrade we are also retiring the macOS pipelines due to lack of build resources for LLVM macos/intel Conan packages. This only affects internal developer scenarios. Public developers continue to rely on vcpkg builds.

commit 2927234171f8e6c960f654909f8ec0a2c19e3c54
Author: Kern Handa <[email protected]>
Date:   Fri Apr 7 17:20:42 2023 +0000

    Merged PR 3172: Adds better support for compiling specifically for AVX2 targets

    * Plumb AVX2 flags to LLVM, with a block for macOS. We plan to remove official support for macOS/Intel starting from LLVM 15 due to limited build resources.
    * Initialize Target.HOST extensions using cpu_info
    * Added more AVX2 filecheck tests to catch LLVM lowering regressions before moving to LLVM 15 [MasonR]

    **Breaking Change**:  Target.HOST no longer unconditionally enables the AVX2 extension if the underlying CPU does not support it, otherwise codegen may result in unsupported instructions.

    To compile for AVX2 if your host doesn't support AVX2, specify Target("<some known AVX2 model name>"). For example, `plan = schedule.create_plan(Target("Intel 6700"))`

commit 6822bcb1fd222fe5b7e7292a9f7d1f35bcf1fdce
Author: Denny Sun <[email protected]>
Date:   Thu Apr 6 21:47:01 2023 +0000

    Merged PR 3203: Plumb target device info into llvm lowering

    llvm lowering now depends on some static complier macro to check target device info, which breaks cross compilation support.

    ```
    // TODO: get check `TargetDeviceInfo` for the OS instead
    ```

    ```
    const int hostBitSize = 64; // TODO:: FIXME :: This assumes that the host is always 64bit
    // Should query the target hardware
    auto llvmIntTy = hostBitSize == 32 ? llvmI32Ty : llvmI64Ty;
    ```

v1.2.28

Toggle v1.2.28's commit message
Squashed commit of the following:

commit 30e07df5e704ef85668093b7350bfdff1a24a7c8
Author: Captain Jack Sparrow <[email protected]>
Date:   Mon Apr 3 20:38:05 2023 +0000

    Merged PR 3199: Rename _slice to slice and add docs

    Rename _slice to slice and add docs

commit 52491f28481ec9ca555c563eaca249ce7d621ad1
Author: Captain Jack Sparrow <[email protected]>
Date:   Mon Apr 3 06:05:52 2023 +0000

    Merged PR 3197: Preserve dest memref shape during SliceOp to SubViewOp lowering

    Preserve dest memref shape during SliceOp to SubViewOp lowering:

    Without this change, subview op would discard the dest memref type required by the slice op. For example,

    ```
    %7 = "accv.slice"(%arg0, %6) {sliceDimensions = [0]} : (memref<1x30x256xui8>, index) -> memref<30x256xui8, affine_map<...>>
    ```

    would get lowered to:

    ```
    %4 = memref.subview %arg0[%3, 0, 0] [1, 30, 256] [1, 1, 1] : memref<1x30x256xui8> to memref<1x30x256xui8, affine_map<...>>
    %5 = memref.cast %4 : memref<1x30x256xui8, affine_map<...>> to memref<?x?x?xui8, affine_map<...>>
    ```
    which does not drop the first dimension as expected. With this fix, the slice op correctly lowers to:
    ```
    %4 = memref.subview %arg0[%3, 0, 0] [1, 30, 256] [1, 1, 1] : memref<1x30x256xui8> to memref<30x256xui8, affine_map<...>>
    %5 = memref.cast %4 : memref<30x256xui8, affine_map<...>> to memref<30x256xui8, affine_map<...>>
    ```

commit 79b6fba2b083e4f38b4b9b5f86d134ebbaf604de
Author: Denny Sun <[email protected]>
Date:   Mon Apr 3 01:02:02 2023 +0000

    Merged PR 3194: Reorder the ops in GetTimeOpLowering to improve the timing accuracy

    In order to get the most accurate timing, we need to order the operations more appropriately,

    ```
    from
            Independent logic
            GetTime()
            Independent logic
            Main logic to profile
            Independent logic
            GetTime()
            Independent logic

    to

            Independent logic
            Independent logic
            GetTime()
            Main logic  to profile
            GetTime()
            Independent logic
            Independent logic
    ```

commit a24f82d514d8ebd5b06de4f5c36d2a13601f4ebe
Author: Denny Sun <[email protected]>
Date:   Thu Mar 30 03:47:56 2023 +0000

    Merged PR 3187: Fully dynamic split_dimension op

    This change enable Accera to be able to split a dynamic dimension by a dynamic size

    ```
    `       M, N, MN = create_dimensions()

            Input = Array(role=Role.INPUT, element_type=ScalarType.float32, shape=(MN, ))
            Output = Array(role=Role.INPUT_OUTPUT, element_type=ScalarType.float32, shape=(M, N))

            nest = Nest(shape=(M, N))
            i, j = nest.get_indices()

            @nest.iteration_logic
            def _():
                split_input = Input._split_dimension(0, N)
                Output[i, j] = split_input[i, j]

           package.add(nest, args=(MN, M, N, Input, Output), base_name=f"{test_name}_fn")`
    ```

commit fe1955c975c3597afd6167203a4c9b7ef7cf4d9b
Author: Kern Handa <[email protected]>
Date:   Wed Mar 29 21:18:06 2023 +0000

    Merged PR 3185: [nfc] Adds tests for vectorization, fast_exp_sum

commit 0f7daceebbfb7382c64678a624955c3c06e81765
Author: Captain Jack Sparrow <[email protected]>
Date:   Wed Mar 29 05:38:53 2023 +0000

    Merged PR 3168: [docs] Tensorization tutorials and type name updates

v1.2.27

Toggle v1.2.27's commit message
Squashed commit of the following:

commit d6e136246b0dfc824c8111827cc6a0f166d3e2ea
Author: Mason Remy <[email protected]>
Date:   Sat Mar 25 02:32:07 2023 +0000

    Merged PR 3181: Fix bug with reinterpret_cast of partially-dynamic array

    Fix bug with reinterpret_cast of partially-dynamic array

commit a4e8fa8874191c5c58475727937cfab4951427fc
Author: Mason Remy <[email protected]>
Date:   Fri Mar 24 22:19:01 2023 +0000

    Merged PR 3180: Enable getting a memref shape from a memref_cast result

    Enable getting a memref shape from a memref_cast result

commit f3df546c84fd1c86d93c357811d43e34e35ae215
Author: Lisa Ong <[email protected]>
Date:   Fri Mar 24 17:28:30 2023 +0000

    Merged PR 3179: Fix vulkan-specific smoke test break

    Missing an import for test_vulkan_gpu_matmul(). This test code path is only exercised when vulkan is installed.

    ```
            format = self.PACKAGE_FORMAT if "VULKAN_SDK" in os.environ else Package.Format.HAT_STATIC
            with verifiers.VerifyPackage(self, "test_vulkan_gpu_matmul", TEST_PACKAGE_DIR):
                package.build(
                    name="test_vulkan_gpu_matmul", format=format, mode=self.PACKAGE_MODE, output_dir=TEST_PACKAGE_DIR
                )
    ```

v1.2.26

Toggle v1.2.26's commit message
Squashed commit of the following:

commit 40dffe83929973c8e205c395be5db23c360c2397
Author: Denny Sun <[email protected]>
Date:   Thu Mar 23 05:06:51 2023 +0000

    Merged PR 3176: [Accera] split_dim op supports dynamic dims with static split size

    With this fix the following test case which has dynamic dims with static split size can succeed.

    ```
            M, MN = create_dimensions()
            N = 16

            Input = Array(role=Role.INPUT, element_type=ScalarType.float32, shape=(MN,))
            Output = Array(role=Role.INPUT_OUTPUT, element_type=ScalarType.float32, shape=(M, N))

            nest = Nest(shape=(M, N))
            i, j = nest.get_indices()

            @nest.iteration_logic
            def _():
                split_input = Input._split_dimension(0, cast(16, ScalarType.index))
                Output[i, j] = split_input[i, j]
    ```

commit 451b67405d77ebbe1cf0722f9c7aeb191c3b4beb
Author: Mason Remy <[email protected]>
Date:   Thu Mar 23 01:19:37 2023 +0000

    Merged PR 3174: Ensure any dynamic allocations are heap allocs that get dealloced

    Ensure any dynamic allocations are heap allocs

commit 602b068f19cdf1ff9111aacbeb5d704974521ebd
Author: Kern Handa <[email protected]>
Date:   Wed Mar 22 20:59:43 2023 +0000

    Merged PR 3171: [test] Add some tests for Dimensions

commit ccd1f5c39964fbe96815672f137b5185ee2e9885
Author: Mason Remy <[email protected]>
Date:   Wed Mar 22 19:41:02 2023 +0000

    Merged PR 3175: Support reinterpret cast of same bitwidth without changing layout

    Support reinterpret cast of same bitwidth without changing layout

commit 270a3c8a9c1e1c06b0da3c9d61fa2f04438e3076
Author: Kern Handa <[email protected]>
Date:   Fri Mar 17 22:16:08 2023 +0000

    Merged PR 3167: Remove hack to treat INPUT_OUTPUT Arrays with shape (1,) as Elements

    I don't have complete context on this, so this might break something. If it does, that should be fixed separately rather than keep this hack around, which breaks semantics in non-obvious ways.

commit efcff61727c64e7f0a37f4f92c701bc47ea1c470
Author: Lisa Ong <[email protected]>
Date:   Fri Mar 17 08:09:07 2023 +0000

    Merged PR 3165: [build] Fix clang 14 release build warnings treated as errors on macOS/Apple

    Errors are showing up on release builds:

    ```
    cmake .. -DCMAKE_BUILD_TYPE=Release -G Ninja
    cmake --build . --config Release
    ```

    Clang version:
    ```
    Apple clang version 14.0.0 (clang-1400.0.29.202)
    Target: arm64-apple-darwin22.3.0
    Thread model: posix
    ```

commit 43f311aa706214243ce8d7acca7d29993bb7003b
Author: Lisa Ong <[email protected]>
Date:   Fri Mar 17 07:02:09 2023 +0000

    Merged PR 3162: Bump vcpkg to latest release

    Last release was Sept 2022. Update to the latest tag (2023.02.24)

    Preparation for LLVM 15 upgrade

commit 07098f502596d997bbe241e95f1130c11e318220
Author: Mason Remy <[email protected]>
Date:   Thu Mar 16 23:27:04 2023 +0000

    Merged PR 3161: Fix cache reduce scale constant hoisting

    Fix cache reduce scale constant hoisting

commit 696ef0df5947067f94b64255e00b7fffc4c04f9d
Author: Mason Remy <[email protected]>
Date:   Thu Mar 16 20:54:22 2023 +0000

    Merged PR 3163: Extend vector masked loads/stores to handle arbitrary bin ops and constant operands

    Extend vector masked loads/stores to handle arbitrary bin ops and
    constant operands

v1.2.25

Toggle v1.2.25's commit message
Squashed commit of the following:

commit 18db5586d1a45f65fd98ea1a21d5fb87db5d2dbf
Author: Lisa Ong <[email protected]>
Date:   Thu Mar 16 03:46:54 2023 +0000

    Merged PR 3160: [security] bump onnx to 1.13.0

    This resolves a high severity dependabot alert

commit 07d16bf787bffa3be93dd7902a402e7e5e660596
Author: Mason Remy <[email protected]>
Date:   Thu Mar 16 02:17:51 2023 +0000

    Merged PR 3157: Dynamic split dim tests

    Dynamic split dim tests

commit 7c5b9a18adbba2ec10461118fb061365e34f5ed0
Author: Denny Sun <[email protected]>
Date:   Wed Mar 15 01:47:45 2023 +0000

    Merged PR 3158: Do not unroll the profiling ops when vectorization enabled

    when vectorization is enabled, the ops in kernel get unrolled, for example, without this fix the timer added to inner kernel will have 8 copies, which is definitely wrong.

commit df217f2e731c2609674da57662eaf1ed6b4a40b0
Author: Denny Sun <[email protected]>
Date:   Mon Mar 13 06:18:41 2023 +0000

    Merged PR 3153: Fix the lowering issue of the profiling ops

    With this fix the kernel level profiling support can work end to end. Here is some example about how to use it:

    ```
            @tile_nest.iteration_logic
            def _tile_logic():
                EnterProfileRegion("pack_b_fn_outer")
                pack_b_fn(B, B_temp, j, k)
                ExitProfileRegion("pack_b_fn_outer")

                EnterProfileRegion("matmul_fn_outer")
                matmul_fn(A, B, C, B_temp, i, j, k)
                ExitProfileRegion("matmul_fn_outer")

                PrintProfileResults()
    ```

    The timings printed out look like:

    ```
    matmul_fn_outer 1       0.000100 ms
    pack_b_fn_outer 1       0.000400 ms
    matmul_fn_outer 2       0.000400 ms
    pack_b_fn_outer 2       0.001200 ms
    matmul_fn_outer 3       0.000600 ms
    pack_b_fn_outer 3       0.001700 ms
    matmul_fn_outer 4       0.000800 ms
    pack_b_fn_outer 4       0.002300 ms
    matmul_fn_outer 5       0.000900 ms
    pack_b_fn_outer 5       0.002700 ms
    matmul_fn_outer 6       0.001200 ms
    pack_b_fn_outer 6       0.003200 ms
    matmul_fn_outer 7       0.001500 ms
    pack_b_fn_outer 7       0.003700 ms
    matmul_fn_outer 8       0.001700 ms
    pack_b_fn_outer 8       0.004000 ms
    matmul_fn_outer 9       0.002000 ms
    pack_b_fn_outer 9       0.004500 ms
    matmul_fn_outer 10      0.002200 ms
    pack_b_fn_outer 10      0.004800 ms
    matmul_fn_outer 11      0.002400 ms
    pack_b_fn_outer 11      0.005300 ms
    matmul_fn_outer 12      0.002700 ms
    pack_b_fn_outer 12      0.006500 ms
    matmul_fn_outer 13      0.003100 ms
    pack_b_fn_outer 13      0.007400 ms
    matmul_fn_outer 14      0.003400 ms
    pack_b_fn_outer 14      0.007800 ms
    matmul_fn_outer 15      0.003700 ms
    pack_b_fn_outer 15      0.008300 ms
    matmul_fn_outer 16      0.004000 ms
    pack_b_fn_outer 16      0.008800 ms
    matmul_fn_outer 17      0.004400 ms
    pack_b_fn_outer 17      0.009199 ms
    matmul_fn_outer 18      0.004800 ms
    pack_b_fn_outer 18      0.009599 ms
    matmul_fn_outer 19      0.005100 ms
    pack_b_fn_outer 19      0.010099 ms
    matmul_fn_outer 20      0.005400 ms
    pack_b_fn_outer 20      0.010599 ms
    matmul_fn_outer 21      0.006000 ms
    pack_b_fn_outer 21      0.011299 ms
    matmul_fn_outer 22      0.006300 ms
    pack_b_fn_outer 22      0.011899 ms
    matmul_fn_outer 23      0.006500 ms
    pack_b_fn_outer 23      0.012299 ms
    matmul_fn_outer 24      0.006701 ms
    pack_b_fn_outer 24      0.012699 ms
    matmul_fn_outer 25      0.006901 ms
    pack_b_fn_outer 25      0.013099 ms
    matmul_fn_outer 26      0.007101 ms
    pack_b_fn_outer 26      0.013399 ms
    matmul_fn_outer 27      0.007300 ms
    pack_b_fn_outer 27      0.013799 ms
    matmul_fn_outer 28      0.007401 ms
    pack_b_fn_outer 28      0.014100 ms
    matmul_fn_outer 29      0.007601 ms
    pack_b_fn_outer 29      0.014600 ms
    matmul_fn_outer 30      0.007801 ms
    pack_b_fn_outer 30      0.015000 ms
    matmul_fn_outer 31      0.007901 ms
    pack_b_fn_outer 31      0.015399 ms
    matmul_fn_outer 32      0.008101 ms
    pack_b_fn_outer 32      0.015699 ms
    matmul_fn_outer 33      0.008301 ms
    pack_b_fn_outer 33      0.015999 ms
    matmul_fn_outer 34      0.008601 ms
    pack_b_fn_outer 34      0.016...

commit 3572c2b081198e1631f2df208c07490c6d4b4bf5
Author: Lisa Ong <[email protected]>
Date:   Fri Mar 10 10:57:39 2023 +0000

    Merged PR 3152: [nfc] [test] Skip fast_exp mlas tests on unsupported Aarch64

    These tests generate `llvm.x86.avx.max.ps.256` which is not supported on non-intel processors like Apple M1

    ```
      %28 = load <8 x float>, <8 x float>* %27, align 4, !dbg !19
      %29 = call <8 x float> @llvm.x86.avx.max.ps.256(<8 x float> %28, <8 x float> <float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000>), !dbg !20
      %30 = call <8 x float> @llvm.fmuladd.v8f32(<8 x float> %29, <8 x float> <float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000>, <8 x float> <float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000>), !dbg !21
      %31 = fsub <8 x float> %30, <float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000>, !dbg !22

    ```

v1.2.24

Toggle v1.2.24's commit message
[doc] fixup version

v1.2.23

Toggle v1.2.23's commit message
Squashed commit of the following:

commit 11e8fdae41e596d6102e46c37a22a26c94d7fe85
Author: Mason Remy <[email protected]>
Date:   Thu Mar 2 05:53:10 2023 +0000

    Merged PR 3131: Set masked load/store inbounds flag to true

    Set masked load/store inbounds flag to true

    The mask we generate, as well as the rest of our infrastructure, will
    prevent out-of-bounds accesses when used properly. Therefore for
    performance reasons we don't want MLIR to generate runtime bounds
    checking

commit 14a04925721ed575befc65e93e4670e27e4d1063
Author: Mason Remy <[email protected]>
Date:   Thu Mar 2 00:28:38 2023 +0000

    Merged PR 3130: Recognize and simplify always true EQ and NE CmpOps

    Recognize and simplify always true EQ and NE CmpOps

    These would already get simplified after converting to the builtin
    dialects, but this makes them happen earlier in the lowering

commit 91b76428c61a52d454ac5ae8fa6485edd9bdfbe5
Author: Mason Remy <[email protected]>
Date:   Wed Mar 1 23:46:29 2023 +0000

    Merged PR 3129: Optimize 1-row horizontal i16->i32 sum reduction

    Optimize 1-row horizontal i16->i32 sum reduction

commit be987bcf641c09dd43d959cc7e8a1b37d33ba591
Author: JUBI TANEJA <[email protected]>
Date:   Wed Mar 1 19:59:34 2023 +0000

    Merged PR 3118: vectorize accumulation of results of two masked load ops

    This PR vectorizes a pattern that occurs in MMIF where there are two conditional loads, followed by an accumulation operation, and a conditional store. On vectorizing the following DSL:
    ```
            N_input = 8
            N_output = 5
            Input = Array(role=Role.INPUT, element_type=ScalarType.int32, shape=(N_input, ))
            Output = Array(role=Role.INPUT_OUTPUT, element_type=ScalarType.int32, shape=(N_output, ))
            nest = Nest(shape=(N_input, ))
            i, = nest.get_indices()

            @nest.iteration_logic
            def _nest():

                def store_value():
                    Output[i] += Input[i]

                _If(i < N_output, store_value)
    ```
    It produces the following assembly. We are looking for `vpmaskmovd` instructions that correspond to vector.transfer_read/vector.transfer_write ops in MLIR.
    ```
    0000000000000030 <test_vectorized_masked_accumulate_3e5de44f3dcca64e>:
      30:   c5 fd 6f 05 00 00 00    vmovdqa 0x0(%rip),%ymm0        # 38 <test_vectorized_masked_accumulate_3e5de44f3dcca64e+0x8>
      37:   00
      38:   c4 e2 7d 8c 0e          vpmaskmovd (%rsi),%ymm0,%ymm1
      3d:   c4 e2 7d 8c 17          vpmaskmovd (%rdi),%ymm0,%ymm2
      42:   c5 ed fe c9             vpaddd %ymm1,%ymm2,%ymm1
      46:   c4 e2 7d 8e 0e          vpmaskmovd %ymm1,%ymm0,(%rsi)
      4b:   c5 f8 77                vzeroupper
      4e:   c3                      retq
    ```

commit 69b87522136cae60b0f5b4d62919a2ebd5577933
Author: Kern Handa <[email protected]>
Date:   Wed Mar 1 17:47:14 2023 +0000

    Merged PR 3126: [test] Adds more tests for vectorized transpose

    [test] Adds more tests for vectorized transpose

commit c4d81701faf3351218cd69726c487f642e4bfca0
Author: Mason Remy <[email protected]>
Date:   Wed Mar 1 06:48:35 2023 +0000

    Merged PR 3121: [nfc] Separate bounds checking into separate pass file

    [nfc] Separate bounds checking into separate pass file

    This removes the bounds checking code from
    ExecutionPlanToAffineLoweringPass and creates a separate pass file for
    it. There is no change in when and where the checking occurs (currently
    it only happens for caching-generated loads and stores).

    In a future change we will further separate the pass and run it at a
    different phase of the lowering and plumb controls for
    enabling/disabling it to the DSL

commit b221544937f8776d48a8f9daddf601378534705b
Author: Mason Remy <[email protected]>
Date:   Wed Mar 1 01:18:59 2023 +0000

    Merged PR 3122: Fix reinterpret_cast output memref shape

    Fix reinterpret_cast output memref shape

commit eb3582ba07cb4118f73bb630589f07de27ba9c50
Author: Mason Remy <[email protected]>
Date:   Fri Feb 24 23:51:30 2023 +0000

    Merged PR 3115: Normalize AffineForOps to have unit stride and begin at 0

    Normalize AffineForOps to have unit stride and begin at 0

commit 3ec2bd7f5353a4119294095eb5084a1e7a298051
Author: Mason Remy <[email protected]>
Date:   Fri Feb 24 22:26:13 2023 +0000

    Merged PR 3117: Vectorize horizontal multi-dim sum reductions

    Vectorize horizontal multi-dim sum reductions

    Recognizes and vectorizes these sum reductions:
      4x16xi16 -> 4x1xi32
      4x8xi32 -> 4x1xi32
      4x8xf32 -> 4x1xf32

commit 6f46df5ba99eeb237dcbbdda28a0975964af1186
Author: Kern Handa <[email protected]>
Date:   Fri Feb 24 11:13:45 2023 +0000

    Merged PR 3099: Adds pattern rewriting for AVX2 vectorized transpose

v1.2.22

Toggle v1.2.22's commit message
Squashed commit of the following:

commit 1691b1d75b89703542514ab102fa2316a40d0ca4
Author: Mason Remy <[email protected]>
Date:   Thu Feb 23 20:16:51 2023 +0000

    Merged PR 3107: Make vectorization happen after inlining and simplification

    Make vectorization happen after inlining and simplification

    This change fills out the vectorization passes and removes vectorization
    from LoopNestToValueFunc. Some bugs were exposed that this also fixes.

    Since vectorization is now a separate pass, mlir filecheck lit tests can
    be run more easily. This change adds the initial file with one test, but
    we should continue expanding this test suite

commit 752615f6351db126e605666c72b309c5ccf436d6
Author: JUBI TANEJA <[email protected]>
Date:   Thu Feb 23 06:06:20 2023 +0000

    Merged PR 3108: extend vectorization for masked store case

commit ab0e23cb0fd7d9186b228ffe3462263eb4bdc3f0
Author: Mason Remy <[email protected]>
Date:   Wed Feb 22 20:04:51 2023 +0000

    Merged PR 3109: Set conan version < 2.0.0

    Our infra isn't set up for the new conan 2 behavior, so fix our usage to
    version 1 until we take the upgrade intentionally

commit 2737012b6f9441929accd9a180efe939dfeebf6f
Author: Captain Jack Sparrow <[email protected]>
Date:   Wed Feb 22 06:33:32 2023 +0000

    Merged PR 3104: Position fusing dim after the fused dimensions

    Position fusing dim after the fused dimensions

commit 15c45b048b453edea05fbfdde253a9acecb5f96a
Author: Chuck Jacobs <[email protected]>
Date:   Tue Feb 21 21:55:21 2023 +0000

    Merged PR 3096: Add "RelWithDebInfo"-like option to accc

    This PR adds another option to the `Options` flag for `AcceraProject.gemerate_and_emit` to keep some debug (the frame pointers) info around when building the Accera project. This can be helpful when trying to interpret perf profiler output.

v1.2.21

Toggle v1.2.21's commit message
Squashed commit of the following:

commit d285212ec95c3e000a0df7e0b769c39c7f343e1b
Author: Lisa Ong <[email protected]>
Date:   Mon Feb 20 09:11:19 2023 +0000

    Merged PR 3101: [build] install pkg-config for macos buddy builds

    Fixes macos packaging build failure:

    https://intelligentdevices.visualstudio.com/ELL/_build/results?buildId=47235&view=results

commit 043e1b3b60a8a7b1c1696bc1975f2bb0c4d6e146
Author: Mason Remy <[email protected]>
Date:   Mon Feb 20 06:42:16 2023 +0000

    Merged PR 3098: [nfc] Move vectorization code to separate files

    [nfc] Move vectorization code to separate files

    Moves vectorization code out of ExecutionPlanToAffineLoweringPass in
    preparation for better separating out a vectorization pass that can be
    run later than vectorization is currently happening

commit 93318d78008484b58f76ce07d394d3ecece1db62
Author: Kern Handa <[email protected]>
Date:   Sat Feb 18 01:59:58 2023 +0000

    Merged PR 3100: Adds CMake dependencies to acc-translate to ensure correct build

    Adds CMake dependencies to acc-translate to ensure correct build

commit 2180ef529673d8afc9e371c0153e9ebef7a81a31
Author: Mason Remy <[email protected]>
Date:   Fri Feb 17 22:10:07 2023 +0000

    Merged PR 3095: Remove duplicate SubArray class

    Remove duplicate SubArray class

commit 12b0d616f3af4e6ca66c1ca8805d60e9f2c86a77
Author: JUBI TANEJA <[email protected]>
Date:   Fri Feb 17 20:18:45 2023 +0000

    Merged PR 3073: vectorize masked load store

    This PR handles vectorization specifically for a masked buffer fill, where the output size is larger than the input. There is a conditional load and vector store.

    Given the nest:
    ```
            @nest.iteration_logic
            def _nest():
                def store_value():
                    Output[i] = Input[i]
                def store_zero():
                    Output[i] = 0
                _If(i < N_input, store_value).Else(store_zero)
    ```
    The unoptimized MLIR is as follows:
    ```
      %c0_i32 = arith.constant 0 : i32
      %c5 = arith.constant 5 : index
      "accv.lambda"() ({
        affine.for %arg2 = 0 to 8 {
          %0 = "accv.cmp"(%arg2, %c5) {predicate = 2 : i64} : (index, index) -> i1
          scf.if %0 {
            %1 = affine.load %arg0[%arg2] : memref<5xi32>
            affine.store %1, %arg1[%arg2] : memref<8xi32>
          } else {
            affine.store %c0_i32, %arg1[%arg2] : memref<8xi32>
          }
        }
    ```
    On vectorizing this for loop, we get the vectorized MLIR (simplified version) as follows:
    ```
      %c5 = arith.constant 5 : index
      %cst = arith.constant dense<false> : vector<8xi1>
      %c0 = arith.constant 0 : index
      %c1 = arith.constant 1 : index
      %c2 = arith.constant 2 : index
      %c3 = arith.constant 3 : index
      %c4 = arith.constant 4 : index
      %c6 = arith.constant 6 : index
      %c7 = arith.constant 7 : index
      %c0_i32 = arith.constant 0 : i32
      "accv.lambda"() ({
        affine.for %arg2 = 0 to 8 step 8 {

          %7 = "accv.cmp"(%arg2, %c5) {predicate = 2 : i64} : (index, index) -> i1
          %9 = "accv.cmp"(%0, %c5) {predicate = 2 : i64} : (index, index) -> i1
          %11 = "accv.cmp"(%1, %c5) {predicate = 2 : i64} : (index, index) -> i1
          %13 = "accv.cmp"(%2, %c5) {predicate = 2 : i64} : (index, index) -> i1
          %15 = "accv.cmp"(%3, %c5) {predicate = 2 : i64} : (index, index) -> i1
          %17 = "accv.cmp"(%4, %c5) {predicate = 2 : i64} : (index, index) -> i1
          %19 = "accv.cmp"(%5, %c5) {predicate = 2 : i64} : (index, index) -> i1
          %21 = "accv.cmp"(%6, %c5) {predicate = 2 : i64} : (index, index) -> i1

          %23 = memref.reinterpret_cast %arg0 to offset: [0], sizes: [5], strides: [1] : memref<5xi32> to memref<5xi32>
          %24 = vector.transfer_read %23[%arg2], %c0_i32, %22 : memref<5xi32>, vector<8xi32>

          %25 = memref.reinterpret_cast %arg1 to offset: [0], sizes: [8], strides: [1] : memref<8xi32> to memref<8xi32>
          vector.store %24, %25[%arg2] : memref<8xi32>, vector<8xi32>
        }
    ```

commit a7ccf52948927a11924c1c6249f34a02cae7b808
Author: Captain Jack Sparrow <[email protected]>
Date:   Fri Feb 17 16:48:17 2023 +0000

    Merged PR 3093: Add meaningful error messages for c++ exceptions

    Add meaningful error messages for c++ exceptions

commit 9ce019ce46a49d48ca8d8e47708022ac4985a4d6
Author: Captain Jack Sparrow <[email protected]>
Date:   Fri Feb 17 02:33:57 2023 +0000

    Merged PR 3092: Add type size getter utility

    Add type size getter utility

commit ff21c4ba07f20551719348e669e6b30d1265ef77
Author: Chuck Jacobs <[email protected]>
Date:   Fri Feb 17 01:09:32 2023 +0000

    Merged PR 3074: Add rudimentary pass to fix redundant load/store issue

    This PR adds a simple pattern to `ValueSimplifyPass` that looks for the redundant load/store pattern we often see at the end of kernels, and removes them.

commit 2a85a12d04f415084bddc268f1d7968cb05efe83
Author: Chuck Jacobs <[email protected]>
Date:   Fri Feb 17 00:10:01 2023 +0000

    Merged PR 3075: Enable `fast_exp` operation

    This PR makes a few changes to enable the `fast_exp` operation:
    - Adds `fast_exp` to the python DSL
    - Enables vectorization of `abs` instruction (which is used by `fast_exp`)

    It also makes a couple of other minor changes:
    - Improves auto-naming of nest indices
    - Better support for using custom LLVM builds with Accera

commit 9f90af1a4709228cfcc1a4a5dafbdf72f83aaf37
Author: Mason Remy <[email protected]>
Date:   Thu Feb 16 00:03:48 2023 +0000

    Merged PR 3088: Support dynamic sub_array shape, split_dim size

    Support dynamic sub_array shape, split_dim size

    This still requires that the sizes are static before lowering, but it
    supports dynamic sizes temporarily before inlining into an outer static
    function

commit 69b2486eae19085cdbeabc4d991dd578f04e2ee0
Author: Kern Handa <[email protected]>
Date:   Thu Feb 9 11:17:24 2023 +0000

    Merged PR 3078: Adds reinterpret_cast functionality to Array

    Adds reinterpret_cast functionality to Array

commit 74f09fe72fdc1facbf2b023c9d3307838afc1b3f
Author: Mason Remy <[email protected]>
Date:   Wed Feb 8 21:27:31 2023 +0000

    Merged PR 3070: Fixes for sub_array and _split_dimension

    Fixes for sub_array and _split_dimension

    This fixes the sub array and split dim ops to work with the accera
    codebase that has updated around them. Some MemoryLayout assumptions are
    getting in the way and have been disabled in the short-term, however
    long term our memory layout behavior should more closely match what MLIR
    affine maps can represent for more generalized dynamic support

commit d8f2baa3ad40bbcfe3a40bb15aafff36819e6067
Author: Captain Jack Sparrow <[email protected]>
Date:   Wed Feb 8 05:52:55 2023 +0000

    Merged PR 3063: Refactor Dimension with C++ backend container class and few other fixes

    - Refactor Dimension with C++ backend container (ScalarDimension)
    - Enable output scalar variables
    - Fix dynamic sized TEMP arrays

commit ffc3d66a473c8b9f6d21aca39e3c515df4bb619b
Author: Lisa Ong <[email protected]>
Date:   Fri Feb 3 09:21:56 2023 +0000

    Merged PR 3072: Bump hatlib version to 0.0.34, skip unsupported test on arm64 macOS, minor targets doc update

    Update hatlib version since there is no incompatibility

v1.2.20

Toggle v1.2.20's commit message
Remove Dimension.py