[pull] main from openxla:main #5

pull · 2023-12-08T23:23:07Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

…#19582) This fixes the error ``` ALREADY_EXISTS; HIP driver error 'hipErrorPeerAccessAlreadyEnabled' (704): peer access is already enabled; creating device 'hip' ``` This should not be treated as an error. Signed-off-by: Boian Petkantchin <[email protected]>

Possibly broken by updating past https://github.com/squidfunk/mkdocs-material/releases/tag/9.5.34 in #19482. > Updated Mermaid.js to version 11 (latest) ## API diagrams Broken: ![image](https://github.com/user-attachments/assets/c2caff21-6f32-4ecc-a0fe-7f6eef1f980b) Fixed: ![image](https://github.com/user-attachments/assets/57e92506-62a8-4b69-8752-1642120c9e24) ## Concepts diagram Before: ![image](https://github.com/user-attachments/assets/72fb7bdc-bc1c-470e-b0cc-19df37f77a4c) Current (broken): ![image](https://github.com/user-attachments/assets/022eff49-6f11-41e9-8cd8-29564540f13b) Fixed: ![image](https://github.com/user-attachments/assets/76d0da19-a52a-49fa-a209-5653a124839e)

Still contains the revert of llvm/llvm-project@169c32e Signed-off-by: MaheshRavishankar <[email protected]>

This PR adds the unit attribute` iree_codegen.tuning_spec_with_default_entrypoint` to indicate the default tuning spec (typically or user-provided tuning spec but can work in the same manner) must contain one named sequence operation marked with `__kernel_config`, also add the corresponding verification in `verifyOperationAttribute` function. This PR is relevant to task in #19214: add [a discardable attr verifier](https://mlir.llvm.org/docs/DefiningDialects/#discardable-attribute-verification) for entry points iree_codegen.tuning_spec_entrypoint Context: Jakub proposed two approaches for verifying the default tuning specification: 1. Implement a dedicated pass for verification. 2. Add a new attribute and update the verifyOperationAttribute function accordingly. After careful consideration, we agreed on the second approach to avoid introducing an additional pass, ensuring a simple implementation. --------- Signed-off-by: Bangtian Liu <[email protected]>

This PR updates the third-party/benchmark in IREE to address the use of the RDCYCLE instruction on RISC-V. Starting from Linux 6.6[1], RDCYCLE is a privileged instruction and cannot be directly accessed from user space. To ensure compatibility, this update transitions to using RDTIME instead. Use RDTIME instead, which while less accurate has the advantage of being synchronized between CPU (and thus monotonic) and of constant frequency. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc4c07c89aada16229084eeb93895c95b7eabaa3 Signed-off-by: Phoebe Chen <[email protected]>

If we want to build the PJRT CPU plugin, we'll run something like ``` pip install --no-deps -v ./integrations/pjrt/python_packages/iree_cpu_plugin/ ``` It works well in the first run. But if we did some changes, and want to run it the second time, some errors will appear: cmake cannot find the ninja in the first run anymore because it's in a temp build environment and removed after the first build is finished. We can remove the build dir to solve this problem. But it will cause a full rebuild, and is quite annoying : ) Since IREE compiler doesn't have such issue so I checked its build script, and I found that it's solved via the function `maybe_nuke_cmake_cache` in its [setup.py](https://github.com/iree-org/iree/blob/76a7b893e4c62d52eae2c165bdb23952a8589689/compiler/setup.py#L177). So I copy it into setup.py of the PJRT plugin and did some modification: - I think the PJRT plugin doesn't rely on CPython API (although it builds a shared library) so we don't need to pin Python version; - the build dir should be passed via a parameter since we have plugins for different platforms (cpu/cuda/rocm..). Also, I used this chance to add `cmake` to build dependencies, in case of some users don't have cmake installed in the system. ci-exactly: build_packages, test_pjrt Signed-off-by: PragmaTwice <[email protected]>

…lt (#19590) This PR is a follow-up to llvm/llvm-project#117340. It disables `lowerPadLikeWithInsertSlice` and `lowerUnpadLikeWithExtractSlice` so `insertslice` or `extractslice` won't appear when high dimensions are unit dimensions. --------- Signed-off-by: jerryyin <[email protected]>

This adds implementations for "getIterationDomainTileFromOperandTile" and "getTiledImplementationFromOperandTile" to linalg_ext.scatter. This allows fusing scatters with producer loops during tiling. The implementation of these methods is trivial because the iteration domain is already defined in terms of the input operands, so we can just invoke the tiling implementation.

See prior updates: #16028 > Happy New Year 🎉 > > Yes, this is a bit silly. We still like to intentionally update the copyright year in this one location so the website appears fresh.

Still carries revert of llvm/llvm-project@169c32e Signed-off-by: MaheshRavishankar <[email protected]>

The error message should be `cuda` instead of `rocm`.

In `flow.call` op, there are two custom `OpBuilder` declarations: https://github.com/iree-org/iree/blob/76a7b893e4c62d52eae2c165bdb23952a8589689/compiler/src/iree/compiler/Dialect/Flow/IR/FlowOps.td#L983-L996 And the second one is defined in `FlowOp.cpp`: https://github.com/iree-org/iree/blob/76a7b893e4c62d52eae2c165bdb23952a8589689/compiler/src/iree/compiler/Dialect/Flow/IR/FlowOps.cpp#L1579-L1583 However, the function definition of the first one is missing. If we try to use it, we'll get a linker error like "undefined symbol" in the build phase. So in this PR I try to add a definition for the first `OpBuilder` (inside the tblgen file instead of `FlowOp.cpp`, since it's simple). --------- Signed-off-by: PragmaTwice <[email protected]>

…19460) The loop here is iterating on arguments of a dead operation. This sometimes works if the operation decided to use the same memory for it's iter arguments, but is relying on undefined behavior. This patch restarts the check each time a new loop is created. No tests for this one, because it sometimes works, depending on how the memory allocator allocates the operation. --------- Signed-off-by: Groverkss <[email protected]> Signed-off-by: MaheshRavishankar <[email protected]> Co-authored-by: MaheshRavishankar <[email protected]>

These changes are needed to be able to propagate reshapes and fold unit dimensions. This essentially changes `scatter` to be more closely in line with [tf.tensor_scatter_nd_update](https://www.tensorflow.org/api_docs/python/tf/tensor_scatter_nd_update) except with a `dimension_map` (side note: the linked tensorflow docs have a really good explanation of the op). This also removes support for non-contiguous scatters because the slice must be right justified (along the innermost dimensions of `updates` and `original`) to prevent ambiguity around how to index `original` and how to scatter `updates`. #### Overview: - Update verifier to handle multiple batch dimensions. Restrict `dimension_map` to allow indexing only of the outermost dimensions, ensuring slices are inserted contiguously. - Fix `TilingInterfaceImpl` to support multiple "batch" dimensions and added test cases to `convert_to_loops.mlir` and `tiling.mlir` - Fix `ScatterOp` description to align with verifier - Add new test cases for `ScatterOp` and remove a few that are no longer supported. --------- Signed-off-by: Ian Wood <[email protected]>

This change uses the result types as a part of the hash when grouping ops. This vastly improves the performance of this pass when there are several similar objects that consist of ops with the same names but differ in the number/type of results. However, this may increase the overhead of hashing when bucketing isn't effective. Although a sample size of one, I found that for 405b tp8, the number of buckets went from 35 -> 140. This brought the time of this pass from a few minutes to several seconds. Signed-off-by: Ian Wood <[email protected]>

Adds nanobind reverts on top of #19600 to allow the macOS build to pass (see #19591).

…ute (#19603) This PR generalizes the cases in which the linking pass can be skipped based on the presence of the default entry point attribute. --------- Signed-off-by: Bangtian Liu <[email protected]>

#19113 uncovered some problems with the logic in this pass. Fixes two problems: 1. If a consumer cannot be collapsed, producers can only collapse dimensions not touched by the consumer 2. When updating which consumer loops can be collapsed, the reassociation of the producer must be taken into account since its possible they are not all contiguous (e.g. a transpose on an input). This is the same logic as in `updateFromConsumer` --------- Signed-off-by: Ian Wood <[email protected]>

If the status was an error status that we are passing in, then it will be passed back to us. It is incorrect to join it. Signed-off-by: Andrew Woloszyn <[email protected]>

This change adds patterns to drop the unit dims of a `iree_linalg_ext.scatter`'s `%updates` tensor. It only drops the leading unit dimensions from the portion of `updates` that represents the indexed dimensions. See the main issue #19091 --------- Signed-off-by: Ian Wood <[email protected]>

With llvm/llvm-project@10ef20f, support for `MLIR_LINK_MLIR_DYLIB` was introduced. With `LLVM_LINK_LLVM_DYLIB` set to `ON` at https://github.com/iree-org/iree/blob/cdf24b9be0354f06879ba08db85ff8a5dbe49b14/build_tools/llvm/llvm_config.cmake#L30, this setting is propagated to `MLIR_LINK_MLIR_DYLIB`. This break the BYO LLVM workflow, see #19549, hence, setting to `OFF`.

Fixes #17344. After nod-ai/SHARK-TestSuite#418, there are only two tests running in that test suite, both of which are XFAIL'd due to programs needing to be regenerated.

Carries 4 reverts Related to Nanobind issues - llvm/llvm-project@5cd4274 - llvm/llvm-project@08e2c15 - llvm/llvm-project@b56d1ec Related to RISC-V compilation llvm/llvm-project@169c32e --------- Signed-off-by: MaheshRavishankar <[email protected]> Signed-off-by: MaheshRavishankar <[email protected]>

This avoids the need for string manipulation at runtime and is what the HSA API expects.

We don't support custom debug sinks in the Runtime Python bindings. In particular the ability to register a custom callback when tracing tensors. This change makes it possible to create a HAL module with a Python function as a callback. This implementation does not handle the case of referencing directly or indirectly the HAL module, VM context or VM instance in the callback function object. In such a scenario the circular reference will not be collected by the garbage collector and will leak. No no check is done to guard against this. It is possible to traverse the Python object structure to detect a reference to VM objects but it would require more effort. Here is added a callback to the debug sink in the IREE native runtime API that signals when the runtime is done using the debug sink. We need this since the Python objects corresponding to native runtime objects are ephemeral and can not be used to hold the reference to the debug sink. --------- Signed-off-by: Boian Petkantchin <[email protected]>

COMPILER_TARGET_BACKEND is something we should deprecate in the future.

This was incorrectly assuming that ordinals are always allowed (they aren't) and that there are exactly as many physical devices with ordinals as there are enumerable logical devices.

In #19902 we added reporting of errors in `LLVMCPUTargetCLOptions::getTargetOptions` which allows reporting things like an unknown CPU before that causes assertion failures in LLVM. But we mistakenly also reported there the warning about the implicit CPU fallback, which is a false positive in this case as it triggers on default targets that we may not actually use. Signed-off-by: Benoit Jacob <[email protected]>

@unpack

#17593 while reproducing this, I was caught by an error `unpack.mlir` ```mlir func.func @unpack(%arg0: tensor<1x5x2x64xf32>) -> tensor<2x320xf32> { %0 = tensor.empty() : tensor<2x320xf32> %unpack = tensor.unpack %arg0 outer_dims_perm = [0, 1] inner_dims_pos = [0, 1] inner_tiles = [2, 64] into %0 : tensor<1x5x2x64xf32> -> tensor<2x320xf32> return %unpack : tensor<2x320xf32> } ``` exec script: ``` iree-opt --mlir-print-ir-before-all --mlir-pretty-debuginfo \ --pass-pipeline="builtin.module(func.func(iree-codegen-generic-vectorization{enable-vector-masking=true}))" \ --split-input-file unpack.mlir ``` Compilation error workaround

This impacts the ability to horizontally fuse the matmuls that feed into `Q-K-V` transpose. The improvements seen with the change might have been due to reduction in copy overheads, which are no more an issue. Signed-off-by: MaheshRavishankar <[email protected]>

The tablegen had some strange auto-generated polymorphism with implicit parsing of certain fields. None of it provided any benefit and is simplified down to just the MMA enum. Also replaces the enum with an enum parameter removing the extra `.getValue()` indirection when accessing the enum.

There are a ton bit extend ops that are getting hoisted which simply convert the weights from f16 to f32 (these ops are fairly small so they don't trigger the max size increase threshold i.e. 1024 elems). Instead, we want these ops to be fused with their consumers to prevent materializing the high bit-width tensors. --------- Signed-off-by: Ian Wood <[email protected]>

* Add warning about ongoing TOSA changes and recommend installing old versions per #19777. * Refresh sample code to download from Kaggle instead of a deleted GCS bucket, making more progress on #18518. I couldn't find an equivalent posenet i8 model, so I used a float32 version that expects different dimensions.

This is based on the materialize-encoding-into-nop pass, with additional patterns to handle load and store ops. For now, we materialize load of a padded input with and extract slice and store as insert slice into a larger tensor. These get folded and become partial loads/stores at the end, but we can change this later. Enable this by default in the LLVMGPU pass pipeline as it's a superset of the existing nop encoding materialization pass. --------- Signed-off-by: Jakub Kuderski <[email protected]>

These types are not available in NumPy so no interoperability is provided for them. --------- Signed-off-by: Boian Petkantchin <[email protected]>

…19926) It drops the revert of llvm/llvm-project@327d627 --------- Signed-off-by: hanhanW <[email protected]>

) Progress on #18174, updating some stale documentation. > [!NOTE] > Demo here: https://scotttodd.github.io/iree/guides/deployment-configurations/cpu/ Changes included: * Switch examples to use ONNX instead of TensorFlow given that users are trying to use TensorFlow and failing: #19852 * Add more documentation for CPU targets and features for #18561 * Standardize some formatting across CPU/CUDA/ROCm/Vulkan pages * Adjust some parts of the ONNX guide now that support is more mature

This drops the local LLVM revert: llvm/llvm-project@de7438e Signed-off-by: hanhanW <[email protected]>

…19726) The revision adds the support for the rest of AffinityOp that have TensorPhase trait, i.e., TensorCloneOp, TensorSliceOp, TensorFillOp, and TensorUpdateOp ops. It is tricky to handle encodings for transfer ops, so only the encoding in the fill op is updated. If other operations have tensor encodings, it returns a failure for now. There are two stream tensor ops do not implement the AffinityOpInterface, so they are not supported within the revision. They are stream.tensor.load op and stream.tensor.store op. We should be able to track the resource affinity for these two ops, and it requires additional analysis. Thus, they are not scoped within the revision. The revision also adds the missing documentation to the `addLayoutsToTensorPhaseOps` method. --------- Signed-off-by: hanhanW <[email protected]>

Revert commits: - llvm/llvm-project@8c1dbac The author is working on a fix, and it is not ready yet. --------- Signed-off-by: hanhanW <[email protected]>

Signed-off-by: hanhanW <[email protected]>

This skips tiling large fills for the same reasoning as in #19887

@ScottTodd

Cherry pick llvm/llvm-project@73f11ac Per @ScottTodd : "fixes Windows builds + our nightly releases". https://discord.com/channels/689900678990135345/1080178290188374049/1337527415492182137 Signed-off-by: Benoit Jacob <[email protected]>

We had previously cherry-picked llvm/llvm-project@73f11ac in #19939. Now we're integrating up to that commit, so it's no longer a cherry-pick. Reverting llvm/llvm-project#125789 because it breaks TorchToTosa, in torch-mlir. We will need to wait for this to be resolved in torch-mlir, then simultaneously bump torch-mlir and drop the revert. Chery-pick a Bazel fix: llvm/llvm-project@4df287a --------- Signed-off-by: Benoit Jacob <[email protected]>

…umers (#19804) This PR adds new logic in ConfigUtils.cpp to analyze a dispatch and determine required multiples of workgroup tile sizes for the root operation. This affects dispatches that contain either tensor.pack or tensor.unpack ops, since the pack and unpack ops require the workgroup tile sizes to be a multiple of their inner_tiles in order for them to be fused into the workgroup scf.forall loop. The following example of a gpu set_encoding dispatch illustrates the new constraint imposed by this PR: ```mlir %in = flow.dispatch.tensor.load ... -> tensor<256x64xi8> %pack = tensor.pack %in ... inner_tiles = [128, 64] ... tensor<256x64xi8> -> tensor<2x1x128x64xi8> %expanded = tensor.expand_shape %pack [[0], [1], [2, 3, 4], [5, 6, 7]] : tensor<2x1x128x64xi8> into tensor<2x1x4x8x4x2x4x8xi8> // linalg.transpose is the root op. The workgroup tile sizes must contain an // even multiple of the tensor.pack inner_tiles. %transposed = linalg.transpose ins(%expanded : tensor<2x1x4x8x4x2x4x8xi8>) outs(%empty : tensor<2x1x8x4x4x4x2x8xi8>) permutation = [0, 1, 3, 6, 2, 4, 5, 7] flow.dispatch.tensor.store %transposed ``` Since the linalg.transpose is the root op, it needs to be aware of its producer chain when selecting tile sizes. With this PR, the lowering config selection logic will walk producers until it hits an unsupported operation or a block argument, and find the LCM of any pack or unpack tiles along the dimensions of their inner_tiles. In the above example, this would look like the following: 1. Walk producer chain up to the producer of `tensor.pack`, and stop at the `flow.dispatch.tensor.load`. The initial workgroup tile size multiples will be `[1, 1]` (i.e., no constraint for unsupported ops). 2. The workgroup tile sizes will be propagated through the `tensor.pack`, which updates the workgroup tile size multiples to `[1, 1, 128, 64]`. 3. Then, it will propagate through the `tensor.expand_shape`, which will expand the workgroup size multiples if possible. In this case, they are expanded to `[1, 1, 4, 8, 4, 2, 4, 8]`. 4. Now walk the consumer chain to find the multiples for the workgroup tile slice of the root op result. In this case, the propagation simply stops at the `flow.dispatch.tensor.store`, and the multiples are `[1, 1, 1, ...]`. 5. Now the root op has the required workgroup tile size multiples for the operand and result slices, and the multiples for the iteration space of the op are computed based on the indexing maps of the operation, by taking the LCM along each dimension of that dimension's multiples from all operands and results. In this case the final workgroup tile size multiples would become `[1, 1, 8, 4, 4, 4, 2, 8]`. --------- Signed-off-by: Max Dawkins <[email protected]>

Fixes a bug in the `transform.iree.match.cast_compatible_dag_from_root` op failing to match when there are repeated operands. --------- Signed-off-by: Max Dawkins <[email protected]>

This test is flaky on CI. I can't reproduce the issue locally and I'm not sure why the file would not be found or would have errors being opened. Maybe due to too much ctest parallelism? Sample logs: * https://github.com/iree-org/iree/actions/runs/12926084982/job/36048365781#step:10:187 * https://github.com/iree-org/iree/actions/runs/13154155457/job/36707383211#step:10:157 ``` 34/1546 Test #12: iree/tools/test/iree-dump-parameters.txt.test ....................................................................***Failed 3.02 sec -- Testing: 1 tests, 1 workers -- FAIL: IREE :: test/iree-dump-parameters.txt (1 of 1) ******************** TEST 'IREE :: test/iree-dump-parameters.txt' FAILED ******************** Exit Code: 2 Command Output (stderr): -- RUN: at line 1: (iree-dump-parameters --parameters=a=C:/home/runner/_work/iree/iree/tools/test/parameters_a.safetensors --parameters=b=C:/home/runner/_work/iree/iree/tools/test/parameters_b.safetensors) | FileCheck C:/home/runner/_work/iree/iree/tools/test/iree-dump-parameters.txt + iree-dump-parameters --parameters=a=C:/home/runner/_work/iree/iree/tools/test/parameters_a.safetensors --parameters=b=C:/home/runner/_work/iree/iree/tools/test/parameters_b.safetensors + FileCheck C:/home/runner/_work/iree/iree/tools/test/iree-dump-parameters.txt C:\home\runner\_work\iree\iree\runtime\src\iree\io\file_handle.c:223: UNKNOWN; failed to open file 'C:/home/runner/_work/iree/iree/tools/test/parameters_a.safetensors'; stack: 0x00007ff6326c6754 iree-dump-parameters <iree_io_file_handle_platform_open+0x1a4> (C:\home\runner\_work\iree\iree\runtime\src\iree\io\file_handle.c:221) 0x00007ff6326c6283 iree-dump-parameters <iree_io_file_handle_create_or_open+0x83> (C:\home\runner\_work\iree\iree\runtime\src\iree\io\file_handle.c:367) 0x00007ff6326c6528 iree-dump-parameters <iree_io_file_handle_open+0x78> (C:\home\runner\_work\iree\iree\runtime\src\iree\io\file_handle.c:419) 0x00007ff6326ac3de iree-dump-parameters <iree_io_open_parameter_file+0x13e> (C:\home\runner\_work\iree\iree\runtime\src\iree\tooling\parameter_util.c:93) 0x00007ff6326ac224 iree-dump-parameters <iree_io_append_parameter_file_to_index+0x64> (C:\home\runner\_work\iree\iree\runtime\src\iree\tooling\parameter_util.c:130) 0x00007ff6326ac732 iree-dump-parameters <iree_tooling_build_parameter_indices_from_flags+0xd2> (C:\home\runner\_work\iree\iree\runtime\src\iree\tooling\parameter_util.c:166) 0x00007ff6326a377a iree-dump-parameters <main+0xca> (C:\home\runner\_work\iree\iree\tools\iree-dump-parameters-main.c:138) 0x00007ff6326d4f88 iree-dump-parameters <__scrt_common_main_seh+0x10c> (D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288) 0x00007ff954d34cb0 ??? <BaseThreadInitThunk+0x10> 0x00007ff95a21edcb ??? <RtlUserThreadStart+0x2b> FileCheck error: '<stdin>' is empty. FileCheck command line: C:\mnt\azure\b\092750\llvm-project\bin\FileCheck.exe C:/home/runner/_work/iree/iree/tools/test/iree-dump-parameters.txt -- ******************** ******************** Failed Tests (1): IREE :: test/iree-dump-parameters.txt ``` skip-ci: not tested by presubmit

Carrying the existing revert of llvm/llvm-project#125789 because it breaks TorchToTosa, in torch-mlir. We will need to wait for this to be resolved in torch-mlir, then simultaneously bump torch-mlir and drop the revert. Signed-off-by: Benoit Jacob <[email protected]>

The CI is green: https://github.com/iree-org/iree/actions/runs/13248061450 Fixes #19664 --------- Signed-off-by: hanhanW <[email protected]>

See the 3.2.0 release tracker at #19641 and published release at https://github.com/iree-org/iree/releases/tag/v3.2.0.

In order to rewrite subspans to buffer descriptors, we might need to be able to fold offsets into the buffer descriptors. This means that we need to be able to replace an offset with a different one (specifically 0) because the offset will be applied to the base pointer during buffer casts. If the offset were dynamic, we can always memref.cast the dynamic-ness of the offset back in, but we can't replace a static offset with a different static offset. Therefore, never create buffers that have a static non-zero offset during bufferization.

Integrate at llvm/llvm-project@001ba42f Carrying the existing revert of llvm/llvm-project#125789 because it breaks TorchToTosa, in torch-mlir. We will need to wait for this to be resolved in torch-mlir, then simultaneously bump torch-mlir and drop the revert. Signed-off-by: Benoit Jacob <[email protected]>

…9923) This is in preparation of the modified way of generating horizontally fused GEMMs. This PR adds kernel configuration for these GEMM ops to allow them to go down the vector distribute pipeline. --------- Signed-off-by: MaheshRavishankar <[email protected]>

This patch remove spurious CAPI dependencies on non-CAPI libraries. CAPI libraries should never be added to non-CAPI libs, as they end-up causing `multiple definition` linking errors. Signed-off-by: fabian <[email protected]>

pull bot added the ⤵️ pull label Dec 8, 2023

sogartar and others added 29 commits January 2, 2025 17:43

Bump LLVM to llvm/llvm-project@cbff02b (#19589)

26b24f2

Still contains the revert of llvm/llvm-project@169c32e Signed-off-by: MaheshRavishankar <[email protected]>

[docs] Update copyright year to 2025 in the website footer. (#19599)

b5272df

See prior updates: #16028 > Happy New Year 🎉 > > Yes, this is a bit silly. We still like to intentionally update the copyright year in this one location so the website appears fresh.

Bump LLVM to llvm/llvm-project@9f5cefe (#19600)

c7086cf

Still carries revert of llvm/llvm-project@169c32e Signed-off-by: MaheshRavishankar <[email protected]>

Fix typo in serializeExecutable of CUDA target (#19609)

0c6d267

The error message should be `cuda` instead of `rocm`.

Bump to LLVM with nanobind reverts (#19605)

e7d4fec

Adds nanobind reverts on top of #19600 to allow the macOS build to pass (see #19591).

[runtime][hip] Fix format errors and conflicting types. (#19607)

c992d29

[Codegen][Tuner] skip linking based on the default entry point attrib…

763406f

…ute (#19603) This PR generalizes the cases in which the linking pass can be skipped based on the presence of the default entry point attribute. --------- Signed-off-by: Bangtian Liu <[email protected]>

[hip] Don't join the status in dispatch_thread. (#19583)

0820f10

If the status was an error status that we are passing in, then it will be passed back to us. It is incorrect to join it. Signed-off-by: Andrew Woloszyn <[email protected]>

Delete test_models job using SHARK-TestSuite/iree_tests. (#19614)

b245e6b

Fixes #17344. After nod-ai/SHARK-TestSuite#418, there are only two tests running in that test suite, both of which are XFAIL'd due to programs needing to be regenerated.

Including the .kd symbol suffix in AMDGPU executables.

c97b084

This avoids the need for string manipulation at runtime and is what the HSA API expects.

Adding COMPILER_TARGET_DEVICE to iree_hal_cts_test_suite.

1ccabe5

COMPILER_TARGET_BACKEND is something we should deprecate in the future.

Fixing HAL driver CTS test to not assume numerical indices exist.

c9fb739

This was incorrectly assuming that ordinals are always allowed (they aren't) and that there are exactly as many physical devices with ordinals as there are enumerable logical devices.

Adding iree_hal_queue_affinity_* utilities.

a8f7a32

dependabot bot and others added 30 commits February 4, 2025 15:04

Bump the github-actions group with 2 updates (#19894)

4923c53

Bump llvm to llvm/llvm-project@c06d0ff806b7 (#19903)

86b845b

[runtime][python] add f8 element types (#19928)

d2c4d83

These types are not available in NumPy so no interoperability is provided for them. --------- Signed-off-by: Boian Petkantchin <[email protected]>

[Integrate] Add arg/res_attrs to ops that implement CallOpInterface. (#…

25ec84c

…19926) It drops the revert of llvm/llvm-project@327d627 --------- Signed-off-by: hanhanW <[email protected]>

[CUDA][Integrate] Switch CUDATarget to ptx_kernel cc. (#19925)

ac46df5

This drops the local LLVM revert: llvm/llvm-project@de7438e Signed-off-by: hanhanW <[email protected]>

Integrate llvm to llvm/llvm-project@e470dcae8d2c41 (#19927)

535c063

Revert commits: - llvm/llvm-project@8c1dbac The author is working on a fix, and it is not ready yet. --------- Signed-off-by: hanhanW <[email protected]>

Integrate llvm to llvm/llvm-project@a1984ec5eab09f (#19934)

b5b943a

Signed-off-by: hanhanW <[email protected]>

[Codegen][GPU] Also don't tile large fills (#19937)

c2e13e8

This skips tiling large fills for the same reasoning as in #19887

LLVM: cherry-pick 73f11ac (#19939)

0781072

Cherry pick llvm/llvm-project@73f11ac Per @ScottTodd : "fixes Windows builds + our nightly releases". https://discord.com/channels/689900678990135345/1080178290188374049/1337527415492182137 Signed-off-by: Benoit Jacob <[email protected]>

[Preprocessing] Fix bug in TD dag matching op (#19945)

624a9fa

Fixes a bug in the `transform.iree.match.cast_compatible_dag_from_root` op failing to match when there are repeated operands. --------- Signed-off-by: Max Dawkins <[email protected]>

Enable iree-test-deps on macos-14 (#19940)

91f59ee

The CI is green: https://github.com/iree-org/iree/actions/runs/13248061450 Fixes #19664 --------- Signed-off-by: hanhanW <[email protected]>

Bump version to 3.3.0 after releasing 3.2.0. (#19951)

d81bb13

See the 3.2.0 release tracker at #19641 and published release at https://github.com/iree-org/iree/releases/tag/v3.2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from openxla:main #5

[pull] main from openxla:main #5

pull bot commented Dec 8, 2023 •

edited

Loading

[pull] main from openxla:main #5

Are you sure you want to change the base?

[pull] main from openxla:main #5

Conversation

pull bot commented Dec 8, 2023 • edited Loading

pull bot commented Dec 8, 2023 •

edited

Loading