diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index 0b1634d11a614..4b65589605306 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -16,6 +16,699 @@ - Did this and that ... intel/llvm#pr +# Release notes Mar'25 + +Release notes for commit range +[b0212c37b2](https://github.com/intel/llvm/commit/b0212c37b230d9dd3bb129df9f4ecc417b92ad8) +... +[b23d69e2c3](https://github.com/intel/llvm/commit/b23d69e2c3fda1d69351137991897c96bf6a586d) + +## New Features + +### Runtime compilation of SYCL code + +- [`sycl_ext_oneapi_kernel_compiler`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc) + extension specification was updated to accept `sycl` as source language, thus + providing functionality similar to + [NVRTC](https://docs.nvidia.com/cuda/nvrtc/). intel/llvm#11985, + intel/llvm#17446 +- Initial support for this feature was implemented. intel/llvm#16132, + intel/llvm#16222, intel/llvm#16132, intel/llvm#17640, intel/llvm#17356, + intel/llvm#16565, intel/llvm#17383, intel/llvm#17447, intel/llvm#17307, + intel/llvm#17373, intel/llvm#17331, intel/llvm#17329, intel/llvm#17266, + intel/llvm#17032, intel/llvm#16823, intel/llvm#16702, intel/llvm#16638, + intel/llvm#16316, intel/llvm#17359, intel/llvm#16485, intel/llvm#16821 +- Known issues and limitations are documented + [in the extension specification](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc#non-normative-implementation-notes-for-dpc). intel/llvm#17307, + intel/llvm#17459 + +### SYCL graphs + +- Introduced and implemented + [`sycl_ext_codeplay_enqueue_native_command`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc) + extension which allows to include custom commands for interoperability with + native runtimes into graphs built using `sycl_ext_oneapi_graph` extension. + intel/llvm#16871 + +### Bindless images + +- Extended the extension + [specification](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc) + to support more kinds of copy operations (`image_mem_handle` to USM and vice + versa, USM to USM, etc.) and implemented them. intel/llvm#16661, + intel/llvm#17507 +- Extended the extension specification and implementation to support + `gather_image` device built-in function. The implementation was only done for + CUDA backend so far. intel/llvm#17322 + +### Native CPU Device + +- Added support for source-based code coverage on Native CPU. intel/llvm#15073 + +### KHR extensions + +Please note that KHR extensions are being specified and released by Khronos +Group. The process of completing and publishing a KHR extension takes a while, +but as implementors we need to prototype them early to help find possible issues +with specifications and ensure that they are implementable. + +During that stage in an extension development its specification is incomplete +and subject to change without any notice. Therefore, we will refer to those +extensions using **prototyped** word. Their implementation is not available by +default and requires `__DPCPP_ENABLE_UNFINISHED_KHR_EXTENSIONS` macro to be set +_before_ including `` header to make them available. Considering +that specifications of such extensions are not final and not versioned, their +prototypes may not exactly match the latest publicly available versions of the +corresponding specifications. There is no guarantee of completeness either. +You can find more details on our development processes in +[this document](sycl/doc/developer/KHRExtensions.md). + +The only reason those extensions are mentioned here is to give you a glimpse of +the future about which extensions will be supported in future releases. We do +not recommend to use such extensions right know, but advanced users who are +driving those extension specifications forward can do early experiments with +them to provide feedback to the Khronos Group. + +- Implemented + [`sycl_khr_default_context`](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:khr-default-context) + extension. intel/llvm#15645 +- **Prototyped** + [`sycl_khr_free_function_commands`](https://github.com/KhronosGroup/SYCL-Docs/pull/644) + extension. intel/llvm#16770, intel/llvm#17222 + +### Other extensions + +- Introduced and implemented + [`sycl_ext_oneapi_device_image_backend_content`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_device_image_backend_content.asciidoc) + extension which allows to query underlying content of a device image for + interoperability with with other runtimes (such as OpenCL or Level Zero). + intel/llvm#14811, intel/llvm#16633 +- Introduced and implemented + [`sycl_ext_oneapi_current_device`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_current_device.asciidoc) + extension which introduces another state into SYCL holding per-thread + `device`. intel/llvm#15382, intel/llvm#16970 +- Introduced and implemented + [`sycl_ext_oneapi_work_group_static`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_work_group_static.asciidoc) + and + [`sycl_ext_oneapi_work_group_scratch_memory`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_work_group_scratch_memory.asciidoc) + extensions that provide different ways of allocating and accessing device + local memory (i.e. shared by all work-items within a work-group). + intel/llvm#15061, intel/llvm#16325 + - The former is only supported on CUDA backend +- Introduced and implemented + [`sycl_ext_intel_kernel_queries`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_intel_kernel_queries.asciidoc) + extension. intel/llvm#16834 +- Implemented proposed + [`sycl_ext_intel_event_mode`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/proposed/sycl_ext_intel_event_mode.asciidoc) + extension. intel/llvm#16108 +- Completed implementation of + [`sycl_ext_oneapi_launch_queries`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/proposed/sycl_ext_oneapi_launch_queries.asciidoc) + extension. intel/llvm#16709, intel/llvm#16051 +- Completed implementation of the + [`sycl_ext_oneapi_kernel_arg_properties`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_arg_properties.asciidoc) + extension by implementing missing `unaliased` property. intel/llvm#16090 + - It used to be called `restrict` in previous versions of the extension, but + a renaming was done to avoid conflict with C99 `restrict` type qualifier. + intel/llvm#16814 +- Introduced and implemented the + [`sycl_ext_oneapi_num_compute_units`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_oneapi_num_compute_units.asciidoc) + extension. intel/llvm#16293, intel/llvm#16538 + +### New compiler options + +- Added support for ``-f[no]-offload-fp32-prec-div` and + `-f[no-]-offload-fp32-prec-sqrt` compiler flags to control precision of + floating-point division and square root. intel/llvm#15836, intel/llvm#16107, + intel/llvm#16993, intel/llvm#17044, intel/llvm#17033, intel/llvm#16942, + intel/llvm#16714, intel/llvm#17393, intel/llvm#17253 + +### Sanitizers + +#### Memory Sanitizer + +- Introduced memory sanitizer support. intel/llvm#15955, intel/llvm#16427, + intel/llvm#16478, intel/llvm#16935, intel/llvm#16535, intel/llvm#16477, + intel/llvm#16567, intel/llvm#16526, intel/llvm#16678, intel/llvm#16566, + intel/llvm#16619, intel/llvm#16705 + + It features: + - Checking for uses of uninitalized values in private memory. intel/llvm#17309 + - Checking for uses of unitialized values in local memory, such + as `local_accessor` or `group_local_memory`. intel/llvm#17180, + intel/llvm#17054 + - Sanitizing USM operations like `memset` or `memcpy`. intel/llvm#16511 + +#### Thread Sanitizer + +- Introduced thread sanitizer support for device code. intel/llvm#17345, + intel/llvm#17211, intel/llvm#17155, intel/llvm#17181 + +## Improvements and bugfixes + +### `sycl_ext_oneapi_graph` extension + +- Reimplemented topological sort algorithm used to determine graph nodes + execution order to avoid issues with overflowing stack on huge graphs and + improve performance. intel/llvm#17495 +- Documented kernel binary update feature which allows to update kernel nodes + in graphs. This feature had been implemented earlier already. intel/llvm#14896 +- Introduced ability to update host-task nodes in graphs. intel/llvm#16853 +- Fixed race condition in `mutable_command_graph` node queries. intel/llvm#17012 +- Fixed the issue with not all graph-related classes fully implementing + common reference semantics. intel/llvm#16788 +- Documented interaction with `sycl_ext_oneapi_local_memory` extension. + intel/llvm#16379 +- Documented interaction with `sycl_ext_oneapi_work_group_memory` extension. + intel/llvm#16229 +- Made `ext_oneapi_weak_object` extension work with graph objects. + intel/llvm#16209 +- Fixed a bug where using `local_accessor` or `work_group_memory` objects as + part of whole graph update would function incorrectly on CUDA & HIP backends. + intel/llvm#16025 + +### SYCLcompat library + +- Introduced new set of group utility functions and classes aimed to reduce the + gap between `syclcompat` and `dpct` namespaces. intel/llvm#17263 +- 73e6b224aacf [SYCLCOMPAT] Forward launch arguments to avoid copies (#16965) + - Definitely user-visible, but I'm not sure how to word that +- Fixed `compare_mask` putting results in the wrong 2-byte segment of 4-byte + output. intel/llvm#16768 +- Optimized implementation of `permute_sub_group_by_xor` for the case when + `logical_sub_group_size == 32`. intel/llvm#16646 +- Added new function `ternary_logic_op` to perform bitwise logical operations + on three input values based on the specified 8-bit truth table. + intel/llvm#16509 +- 6e0d90e73ed1 [SYCLCompat] Fix vectorized_binary impl to make SYCLomatic migrated code run pass (#16553) + - Not sure how to word that +- 16c447998836 [SYCL][COMPAT] Replace T{-1} with static_cast(-1) for mask creation (#16527) + - bugfix? + +### Explicit SIMD extension + +- Extended + [`sycl_ext_intel_esimd`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_intel_esimd/sycl_ext_intel_esimd.md) + extension specification and implementation with new queries to check support + for 2d load/store/prefetch operations. intel/llvm#15905 +- Fixed miscompilations of ESIMD functions under high optimization levels when + compiler performs aggressive inlining. intel/llvm#16193 + +### Sanitizers + +- ada16682e8c3 [DevASAN] Only report warning if passing host ptr to kernel (#16654) + - seems like a potentially important bugfix +- ce4a320806b2 [DeivceASAN] Make ShadowMemory one instance per type (#16687) + - seems like some kind of bugfix +- ef4d66af3b74 [DeviceSAN] Fix kernel name addressspace (#16425) +- 1fba00d3be7d [DeviceASAN] Fix ASAN with kernel assert (#16256) +- 34aeabab551e [SYCL][DeviceASAN] Fix AcceeChain to a matrix for bfloat16 (#16323) +- 6f3b0e857d15 [DevASAN] Do allocation with USM pool to reduce memory overhead (#16280) +- a8c6e7715be2 [DeviceASAN] Re-use shadow if required size is not larger than last one (#16258) + - As the above, some optimization of memory usage by ASAN? +- 201725664cc5 [DeviceSanitizer] Fix device global type of KernelMetadata (#16357) + +#### Address Sanitizer + +- Fixed ASAN throwing an exception with `UR_RESULT_ERROR_INVALID_ARGUMENT` when + detecting incorect memory free operation. intel/llvm#16706 + + + + +- bee8a397ac72 [UR][DeviceASAN] Sync the latest changes in asan_libdevice.hpp (#15911) +- 6347914485a8 [DeviceAsan] bugfixes for UR (#16257) +- e9143ca66108 [DeviceAsan] Report error when using unsupported API (#16281) +- 092cd2dfc034 [UR][DeviceASAN] Bugfix for mmap (#16466) +- 696514238e2e [DeviceASAN] Fix kernel release order (#16688) +- cc6148dfd17c [UR][DeviceASAN] Bugfix for GetDeviceType (#16745) +- 76c665363565 [DeviceSanitizers] Adjust backtrace addresses to call instruction (#17404) +- e2ab2b9ba963 [DevSan][Refactor] Make Options an unified class shared by all sanitizers (#17157) + +### Bindless images + +- Added support for timeline semaphores. intel/llvm#17395 +- Added support for `ext_oneapi_bindless_sampled_image_fetch_1d`, + `ext_oneapi_bindless_sampled_image_fetch_1d_usm`, + `ext_oneapi_bindless_sampled_image_fetch_2d`, + `ext_oneapi_bindless_sampled_image_fetch_2d_usm` and + `ext_oneapi_bindless_sampled_image_fetch_3d` aspects on Level Zero backend. + intel/llvm#16862 +- Fixed return types of image extent queries to match the specification. + intel/llvm#16829 +- Clarified the types of supported USM memory in the extension specification. + intel/llvm#16622 + +- 3161af314190 [SYCL][Ext][Bindless] Initial implementation of image spirv builtins on HIP (#16439) + - What exactly does it mean for end user? +- b732a3c4c9cb [SYCL][Bindless] Fix incorrect mangling of bindless images builtin functions (#16135) + - What was the user-visible effect of the issue? + +### Native CPU device + +- Improved support for `dynamic_address_cast` on Native CPU device. + intel/llvm#16676 +- Improved performance of Native CPU device: less memory allocations and thread + launches. intel/llvm#17102 +- Fixed a bug where submitting the same kernel multiple times at about the same + time with different argument would lead to incorrect arguments being used. + intel/llvm#16995 +- Fixed compiler crashes when building applications that use atomics. + intel/llvm#16737 +- Fixed segfaults happening in SYCL CTS tests for `async_work_group_copy` + API. intel/llvm#16500 +- Improved support for sub-groups by updating version of OneAPI Construction + Kit. intel/llvm#16785 + +### Matrix + +- Aligned `joint_matrix_apply` implementation with the specification change + (intel/llvm#13153) to be able to modify both matrices. intel/llvm#16155 + +### Documentation + +- Proposed the + [`sycl_ext_oneapi_syclbin`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/proposed/sycl_ext_oneapi_syclbin.asciidoc) + extension. intel/llvm#16784 +- Updated the + [`sycl_ext_intel_device_info`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_intel_device_info.md) + extension specification to clarify that no additional environment variables + are required anymore to make the extension functional. intel/llvm#16715 +- Updated the + [`sycl_ext_intel_device_info`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_intel_device_info.md) + extension to reflect the current level of support for it on different + backends. intel/llvm#16792 +- Fixed mistakes in APIs naming in the + [`sycl_ext_oneapi_peer_access`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_oneapi_peer_access.asciidoc) + extension specification. intel/llvm#17327 +- Fixed example provided in the + [`sycl_ext_oneapi_backend_level_zero`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_oneapi_backend_level_zero.md) + extension. intel/llvm#16901 +- Updated wording in the proposed + [`sycl_ext_oneapi_launch_queries`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/proposed/sycl_ext_oneapi_launch_queries.asciidoc) + extension to better match ISO C++ format and clarify how different overloads + are intended to behave. intel/llvm#16014 + +#### intel/llvm project + +This sub-category does not cover the product (Intel's SYCL implementation), but +it covers how you can engage and interact with the project, i.e. various +development processes. + +- Updated the project's + [security policy](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/SECURITY.md) + . intel/llvm#16559 +- Documented + [process](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/developer/KHRExtensions.md) + of prototyping KHR extensions. intel/llvm#16883 +- Documented + [process](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/developer/WorkingOnAReleaseBranch.md) + of working on release branches. intel/llvm#17042 +- Refreshed + [documentation](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/test-e2e/README.md) + on adding tests to the repository to reflect recent infrastructure + advancements/changes. intel/llvm#16409, intel/llvm#16875, intel/llvm#16967 + +### Support for new hardware + +- Updated + [`sycl_ext_oneapi_device_architecture`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc) + extension specification and implementation to recognize Intel Panther Lake + H & U GPUs and Intel Xeon processors codenamed Diamond Rapids devices. + intel/llvm#16294, intel/llvm#16543 +- Taught the compiler about optional features supported by Intel Panther Lake + H & U GPUs (necessary for the correct AOT compilation). intel/lvm#16368 +- Updated + [`sycl_ext_intel_matrix`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc) + extension specification and implementation to support Intel Xeon processors + codenamed Diamond Rapids. intel/llvm#16543 + +### Optimizations of SYCL Runtime + +Within this release some work has been done to reduce overheads incurred by +SYCL runtime over low-level runtimes (such as Level Zero or OpenCL): + +- Reduced amount of string copies unnecessarily made by SYCL RT for debug traces + even if debug tracing is disabled. intel/llvm#16596 +- Reduced number of times `shared_ptr`s are copied. intel/llvm#17396, + intel/llvm#17477, intel/llvm#17473 +- Reduced amount of memory allocations happening by moving away from using + `std::function`. This should also help with reducing compilation time of SYCL + headers. intel/llvm#17202, intel/llvm#16668 +- Reduced amount of memory allocations required for `local_accessor`. + intel/llvm#17147, intel/llvm#17510 +- Reduce amount of memory allocations on "fast" kernel enqueue path and dropped + some unnecessary runtime checks. intel/llvm#17312, intel/llvm#17376 +- Made more queue operations go through "fast" path. intel/llvm#16735 + +### Core SYCL 2020 functionality + +- Aligned `SYCL_LANGUAGE_VERSION` macro definition with the recent SYCL 2020 + spec change (KhronosGroup/SYCL-Docs#704). intel/llvm#15890 +- Implemented `swizzle` method for swizzles. intel/llvm#16353 + +### Other changes in SYCL Compiler + +- Introduced a new optimization to eliminate back-to-back barriers when its + safe. Such chain of barriers may occur when multiple group algorithms are + used next to each other. intel/llvm#16750 +- Removed a busy-wait loop from the implementation of + `-fsycl-max-parallel-link-jobs` flag, making it consume less resources when + waiting. intel/llvm#17260 +- Made `-O0` to be the default optimization level when debug info is enabled + through `-g` flag. intel/llvm#16408 +- Uplifted maximum version of SPIR-V that compiler can generate to 1.5. + intel/llvm#16626 +- Made compiler embed device library needed for `bfloat16` support into the + application (if it is used). This change will allow us to reduce the size + of redistributable SYCL RT package by eliminating some files from it. + intel/llvm#16729 +- Added a compiler diagnostic (warning) about undefined `SYCL_EXTERNAL` + functions used in a module to help catch linking errors earlier. + intel/llvm#17346 +- Addressed issue intel/llvm#11531 where the compiler would generate invalid + SPIR-V if kernel used arguments of boolean type. intel/llvm#17427 +- Switched to use native `bfloat16` implementation for devices that support it + (LNL, PVC), as well as fixed a bug where native implementation won't be used + if multiple AOT targets are specified. intel/llvm#17154, intel/llvm#16240, + intel/llvm#16494 +- Aligned behavior of `-Wimplicit-float-conversion` with the upstream clang for + non-SYCL language modes. intel/llvm#16857 +- Added support for `dynamic_address_cast` on CUDA & HIP backends. + intel/llvm#16604 +- Fixed compilation errors when building applications that use `nearbyint` and + `rint` for HIP targets. intel/lllvm#16373 +- Improved check for unsupported data types to actually rely on target + information instead of hardcoded knowledge. For example, this allows 128-bit + integeres to be used in device code when targeting CUDA backend. + intel/llvm#17036 +- Fixed hangs on AMD and crashes on NVIDA when `atomic_ref` is used with + `work_item` memory scope. intel/llvm#16172 +- Fixed `-fcuda-short-ptr` flag causing compilation errors. Its use will still + result in a warning that some implicitly linked object is not compiled with + that flag (namely some of our built-in libraries), but it shouldn't be a + problem because those libraries don't operate on pointers. intel/llvm#15642 +- Fixed intel/llvm#15852 where compilation with `-mlong-double-64` would still + result in error that 128 double is not supported by a target. intel/llvm#16441 +- Fixed a bug that linking static libraries with SYCL code in them using + `-l:libname.a` spelling would ignore device code from those libraries. + intel/llvm#17149 +- Fixed a bug where having a pure virtual function marked as device one would + cause unresolved symbol errors emitted by device compiler on Windows. + intel/llvm#16231 +- Fixed a bug where having two kernels (one annotated with + `reqd_work_group_size` attribute/property and another without it) together + with `-fsycl-device-code-split=off` would cause runtime error about + mismatched work-group size. intel/llvm#16236 +- Fixed debug information for kernels that use global offest on HIP & CUDA + backends. intel/llvm#16963 + +### Other changes in SYCL Library + +- Made `group_[load|store]` functions to use native built-ins when used with + vectors of 16 `short`s. intel/llvm#16581 +- Extended support for shared libraries to make it work with kernel bundles + as well. intel/llvm#16228 +- In response to intel/llvm#17114 added tracing (through `SYCL_UR_TRACE`) for + `SYCL_DEVICE_ALLOWLIST` decisions for better discoverability of the feature. + intel/llvm#17426 +- Aligned implementation of `info::execution_capability` query with the recent + SYCL 2020 specification change made in KhronosGroup/SYCL-Docs#625. + intel/llvm#16673 +- Fixed compilation issues with group functions like `select_from_group` with + certain data types (pointers, `marray` for example). + intel/llvm#17055 +- Implemented persistent cache eviction. intel/llvm#16289, intel/llvm#16522, + intel/llvm#16454 +- Enforced constraints documented by the + [`sycl_ext_oneapi_reduction_properties`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_reduction_properties.asciidoc) + extension. intel/llvm#16238 +- Clarified and enforced properties constraints in the + [`sycl_ext_oneapi_group_load_store`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_group_load_store.asciidoc) + extension specification and implementation. intel/llvm#16422 +- Implemented properties validation to kernel bundle and graph APIs. + intel/llvm#15647 +- Updated the + [`sycl_ext_oneapi_in_order_queue_events`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_in_order_queue_events.asciidoc) + extension specification and implementation to make event returned by + `ext_oneapi_get_last_event` optional for queues where no work had been + submitted. intel/llvm#16645 +- Update the + [`sycl_ext_oneapi_group_load_store`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_group_load_store.asciidoc) + extension specification and implementation to accept the `alignment` property + in group load/store built-in functions to allow for more optimized + implementation. intel/llvm#16882, intel/llvm#16890 +- Lifted restriction that host APIs from `sycl_ext_oneapi_free_function_kernels` + had to be guarded by `#ifndef __SYCL_DEVICE_ONLY__`. intel/llvm#17446 +- Completely disabled legacy images support (from SYCL 1.2.1) on HIP backend. + They were previously available under an environment variable, but the status + was so bad that there is no sense to keep the support at all. intel/llvm#17296 +- Fixed potential resource leaks in online compiler extension. intel/llvm#16517 +- Fixed an issue where `known_identity` would return incorrect values + with `-ffast-math` flag. intel/llvm#17028 +- Fixed a UB in implementation of `device_global` which sometimes led to + spurious results. intel/llvm#16224 +- Fixed a `static_assert` failure in SYCL headers when an application is + built with `-funsigned-char`. intel/llvm#17133 +- Resolved intel/llvm#15606. The issue caused memory operations enqueued through + `sycl_ext_oneapi_enqueue_functions` extension to break functionality of + `sycl_ext_oneapi_enqueue_barrier` extension. intel/llvm#16223 +- Fixed a bug where compiling with `-D_FORTIFY_SOURCE=2` would cause errors + from device compilers at JIT stage (or during AOT compilation) about + undefined `__memcpy_chk` symbol. intel/llvm#16501 +- Fixed an incorrect result of `std::exp(std::complex)` in some corner cases. +- Fixed a crash happening when you launch a kernel that is defined in both the + application and a `dlopen`-ed shared library after that library was unloaded + through `dlclose`. intel/llvm#17091 +- Fixed issue intel/llvm#14357 about + `kernel_device_specific::compile_sub_group_size` info query returning + incorrect results for CUDA & HIP backends. intel/llvm#17137 +- Fixed a memory leak happening when a kernel submission failed. + intel/llvm#17125 +- Fixed a bug where using `vec::operator[]` would cause compilation issues on + Windows when an application is built using `clang.exe` and `_DEBUG` macro is + set. intel/llvm#17025, intel/llvm#17261 + intel/llvm#17440 + +#### Issues with 3rd-party host compilers + +- Fixed compilation issue with `get_vec_idx` internal helper with MSVC as + host compiler. intel/llvm#16480 +- Fixed missing `#include` when building with GCC 13 as host compiler. + intel/llvm#16480 +- Fixed compilation issue with joint matrix extension with MSVC from Visual + Studio 2019 as host compiler. intel/llvm#17336 + +### Support for pre-C++11 ABI + +Many SYCL APIs use `std::string` as argument or return type and it is known for +its ABI being broken by `gcc` at some point. There are applications which are +still built using old, pre-C++11 ABI and in order to support them, SYCL RT +should not have `std::string` (and some other classes) used at the ABI boundry. +This effort has been largely complete, but some APIs still sneak up from time +to time and being fixed: + +- Added support for `print_graph` API in pre-C++11 ABI mode. intel/llvm#16194, + intel/llvm#16390 +- Added support for `pipe::get_pipe_name` API in pre-C++11 ABI mode. + intel/llvm#16178 +- Decided **not** to support `get_backend_info` in pre-C++11 ABI mode (at least + for now) because there are no queries that could be done through it. Calling + it under pre-C++11 ABI mode now causes an error. intel/llvm#16272 + +## Misc + +- Removed testing on FPGA Emulator as a step towards our strategy to drop FPGA + support (see intel/llvm#16929). Starting with this release there is no + guarantee that FPGA-specific features continue to work. intel/llvm#17223 +- Introduced new Unified Runtime adapter for Level Zero called `v2`. It is + expected to be more performant than existing one, but it is still in + development and unused by default. intel/llvm#16656, intel/llvm#17407 +- Docker images containing nightly builds are not provided anymore, but we + still provide Dockerfiles so you can build those images yourself. + intel/llvm#16539 +- Fixed OCL CPU Runtime installation script leaving incorrect permissions on + a system folder. intel/llvm#16719 + +## API/ABI breakages + +### Changes that are effective immediately + +- Removed support for FPGA-related options as part of our strategy to drop FPGA + support (see intel/llvm#16929). Removed options: `-fintelfpga`, + `-fsycl-targets=spir64_fpga[-unknown-unknown]`, `-fsycl-link=early|image`, + `-Xsycl-target-backend=spir64_fpga "opt"`, `-reuse-exe=arg` and + `-fsycl-help=fpga`. intel/llvm#16864 +- Removed experimental `sycl_ext_intel_oneapi_compiler` extension support. Its + APIs have been marked as deprecated for a while and + `sycl_ext_oneapi_kernel_compiler` extension should be used instead. + intel/llvm#16776 +- Restricted accepted spellings for AMD targets in `-fsyhcl-targets` to + `amdgcn-amd-amdhsa`. intel/llvm#15990 + +### Deprecations + +Those APIs are still present and tested, but they will be removed in future +releases: + +- Deprecated [`sycl_ext_oneapi_default_context`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/deprecated/sycl_ext_oneapi_default_context.asciidoc) + extension in favor of + [`sycl_khr_default_context`](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:khr-default-context) + extension. intel/llvm#17135 +- Deprecated `-fsycl-fp32-prec-sqrt` compiler flag in favor of + `-foffload-fp32-prec-sqrt` flag. intel/llvm#17257 +- Deprecated overloads of `single_task` and `parallel_for` APIs that accept + properties which used to be a part of `sycl_ext_oneapi_kernel_properties` + extension. `sycl_ext_oneapi_enqueue_functions` extension should be used + instead. intel/llvm#16728 + - Deprecated overloads were completely removed from the extension + specification. intel/llvm#14785 +- Deprecated current implementation of `get_backend_info` API. The SYCL 2020 + specification currently does not document anything that could be queried + through it and therefore existing queries supported through it are deprecated + to avoid possible confusion. intel/llvm#16700 + +### Upcoming API/ABI breakages + +This changes are available for preview under `-fpreview-breaking-changes` flag. +They will be enabled by default (with no option to switch to the old behavior) +in the next ABI-breaking release: + +- Removed implementation of `get_backend_info` APIs, see above in the + Deprecations section. intel/llvm#16700 + +## Known Issues + +- SYCL headers use unreserved identifiers which sometimes cause clashes with + user-provided macro definitions (intel/llvm#3677). Known identifiers include: + - `G`. intel/llvm#11335 + - `VL`. intel/llvm#2981 +- On Windows, the Unified Runtime's Level Zero leak check does not work + correctly with the default contexts on Windows. This is because on Windows + the release of the plugin DLLs races against the release of static global + variables (like the default context). +- Intel Graphic Compiler's Vector Compute backend does not support + O0 code and often gets miscompiled, produces wrong answers + and crashes. This issue directly affects ESIMD code at O0. As a + temporary workaround, we have optimize ESIMD code even in O0 mode. + [00749b1e8](https://github.com/intel/llvm/commit/00749b1e8e3085acfdc63108f073a255842533e2) +- When using `sycl_ext_oneapi_matrix` extension it is important for some + devices to use the sm version (Compute Capability) corresponding to the + device that will run the program, i.e. use `-fsycl-targets=nvidia_gpu_sm_xx` + during compilation. This particularly affects matrix operations using + `half` data type. For more information on this issue consult with + https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#wmma-restrictions +- C/C++ math built-ins (like `exp` or `tanh`) can return incorrect results + on Windows for some edge-case input. The problems have been fixed in the + SYCL implementation, and the remaining issues are thought to be in MSVC. +- There are known issues and limitations in virtual functions + functionality, such as: + - Optional kernel features handling implementation is not complete yet. + - AOT support is not complete yet. + - A virtual function definition and definitions of all kernels using it + must be in the same translation unit. Please refer to + [`sycl/test-e2e/VirtualFunctions`](https://github.com/intel/llvm/tree/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/test-e2e/VirtualFunctions) + to see the list of working and non-working examples. + + +43ee65117935 [UR] Fix potential deadlock in the WaitEvent path of CmdBuffers (#16697) +40f0a6a0630b [SYCL][Graph] Fix L0 multi-device kernel bundles (#16343) +c5c0be57c660 [UR][CUDA][HIP] Add missing catch for native commands (#17524) +55a098709da2 [UR] Regenerate ur_valddi.cpp (#17487) +9c5762226d4f [UR] Fix cfi initialization (#17472) +a0df523df9c1 [UR] Deprecate UR_DEVICE_INFO_BFLOAT16 (#17053) +7650d831bb0f [UR] Check null pointer before handle in validation layer (#17474) +d6214ad114f8 [SYCL][UR][CUDA] Fix CMake CUPTI config (#17457) +44f120e92c12 [UR] Remove unnecessary unique pointer from cl program helper. (#17101) +19dbfb7605e9 [UR][L0] Create pool descriptors from subdevices... (#17465) +a9cb8f1e8540 [UR] Bump UMF to v0.11.0-dev4 (#17468) +1167ee6b388e [UR][CMake]Set CMAKE_MSVC_RUNTIME_LIBRARY to fix UMF linking issues (#17366) +fb582a748e42 [UR] [V2] Fix synchronization between command_list_manager usages (#17297) +b2db12abbb33 [UR] Fix typo in cfi flags handling. (#17452) +a1065507ce1f [UR] Handle adapters returning no platforms during testing (#17410) +768c5eaf4aca [UR] Generated code hidden by default in PR diffs (#17414) +2975d26c7050 [SYCL][UR][L0 v2] use blocking free when returning memory to the driver (#17375) +c342667c7e4f [UR] Updates to source checks job (#17160) +17df762be147 [SYCL][UR][CUDA] Use FindCUDAToolkit CMake module instead of FindCUDA (#17315) +0e155f0c1ab0 [SYCL][CUDA] Fix cupti library dynamic loading (#17272) +a000e56a9542 [UR][SYCL] Remove UR context atomic queries. (#16160) +38d750678fae [UR][L0] Manage UMF pools through usm::pool_manager (#17065) +c07039e2263d [UR] Add device info query for native assert. (#15929) +f870412188a5 [UR] Add UNSUPPORTED_FEATURE return for urDeviceGetGlobalTimestamps (#17389) +16713eae8c44 [UR] Allow loader to skip adapters based on prefilter device type. (#17072) +6ef844897f8a [UR][L0] Fix bfloat16 lookup to check for the extension (#17364) +68cacbfed4f5 [UR][L0] Disable Immediate Command List DG2 Windows (#17334) +607dff4c92a1 [UR] Stop using extension strings to report support for exp features. (#16046) +60ffdc3a97de [UR][L0] fix external semaphore with updated headers and report device info support (#17286) +255760e5af11 [UR] Fix various defects from static analysis (#17299) +fad173cbd14f [UR] Add UR_EXTERNAL_DEPENDENCIES CMake option (#17291) +a7774f2a74c5 [UR] Fix some tests that are broken when run with multiple cuda devices available. (#17216) +f01edd3abbb4 [UR] [L0] Update UR to link the Loader as static (#17104) +f36f787137d1 [UR][L0] Fix assignment of the in order flag for sync immediate list (#17199) +feed8b11e4fd [SYCL][CUDA] Fix adapter cupti linking (#17224) +b183751df103 [AsyncAlloc][UR][Exp] Initial API for async alloc entry points (#17117) +35fba198274c [UR][CUDA] Avoid unnecessary calls to cuFuncSetAttribute (#16928) +8fa2a120729b [UR] Improvements to align CTS and Spec for Program (#17094) +a4f976433067 [UR][CUDA] Change MAX_MEMORY_BANDWIDTH device query to uint64 (#16869) +5c76d4cd47ad [UR] Remove unnecessary and confusing unique_ptr usage (#17144) +1eccddb52284 [UR][L0 v2] check if copy offload is supported before requesting it (#17120) +646a5088f0d4 [UR] Update dependentloadflag for L0 adapters dlls (#17078) +2e288b083bc3 [UR] Improvements to align CTS and Spec for Device (#16746) +1515afacc6ae [UR][L0] Disable command-buffer immediate append path (#17097) +ae09897ff29d [UR] Use relative xpti/xptifw source when available (#17099) +d5bcb59367f7 [UR] Correct copyright string in Windows proxy loader (#17060) +5aa3157aba07 [SYCL][CUDA] Use UMF Proxy pool manager with UMF CUDA memory provider in UR (#17015) +e925b2b9f4c5 [SYCL][CUDA] Update UMF in UR to fix issue in LLVM (#17034) +5e01636b22ae Move Unified Runtime code into intel/llvm +f3d12f0167da Do not fetch cudart from gitlab for UMF (#16941) +23b2457304bd Use UMF CUDA provider in Unified Runtime (#16761) +64a095c36113 [SYCL][UR] Improve header copy dependencies (#17093) +928ed3e5a470 [UR][L0] Fix issue with command-buffer local mem update (#17069) +1638be92ec1d [UR] Choose in-tree unified-runtime directory if present (#16833) +113b46788672 [UR][L0]: MAX_COMPUTE_UNITS using ze_eu_count_ext_t (#16818) +d3e825ca3058 [UR][L0]: fix missing destroy of event given enqueue wait out event (#16759) +479da1d68964 [UR] Bump tag to 08d36b76 (#16810) +a6ebaa40ec28 [UR] Move urMemImageGetInfo success test from a switch to individual test (#16655) +a739d3418140 [UR] Make each profiling info variant for urEventGetProfilingInfo optional and improve its conformance test (#17067) +5f7043dc931a [UR] Don't set -pie on shared objects (#16880) +988c4777a709 [UR] In-order path for OpenCL command-buffers (#17056) +69941b863470 [UR] Make command-buffer creation descriptor mandatory (#17058) +0b979bf73689 [UR] Add remaining calls shared with queue in level-zero v2 adapter (#17061) +d142923d2a61 [UR][CL] Fix invalid use of dlopen() (#16736) +cf19f7758c6e [UR] fix parseDisjointPoolConfig and add tests (#16791) +02d2e34c1c83 [UR] Fix kernel arguments being overwritten in the CUDA and HIP adapters (#16733) +73f54e5296c5 [UR] Make adapters check native properties before dereferencing. (#16730) +9c65739ea12b [UR] Bump with DEVICE_INFO_PROGRAM_SET_SPECIALIZATION_CONSTANTS (#16659) +16ca790241fe [UR] Update tag to 8b7a9957 for https://github.com/oneapi-src/unified-runtime/pull/2582 (#16689) +b9a755831a77 [SYCL] Update UR tag for L0 synchronize fix (#16629) +8998b9b54f85 [UR] Unified clang format (#16672) +69cbf2a86d94 [UR] Bump UR version (main) with UMF v0.10.1 release (#16571) +a204f4031b45 [UR] Wrap urEventSetCallback when ran through loader (#16572) +48297dfbd765 [UR] Pull in fix needed for CTS device parameterization. (#16519) +c6b1edfb7512 [UR] Use reference counting on factories (#15296) +93de8f1c6127 [UR] Improve Kernel CTS (#16555) +c0e2fd19641e [UR] Bump tag to 3472b5bda (#16531) +8329e7bffca4 Bump for uninitialized-cuda-events-fix (#16469) +dcfdcfa249ab [UR] Update tag to ad288bb (#16512) +788ff7ffc01d [UR] Bump UR with zeCommandListImmediateAppendCommandListsExp usages fixes (#16458) +931a93b3fecf Bump UR and adjust use of urKernelSuggestMaxCooperativeGroupCountExp (#15966) +7a4a978e3483 [UR][L0] Fix Event Memory Leak due to no destroy on delete (#16410) +f1627498fc10 [UR][OpenCL] add a few missing Intel GPU device queries, fix device ID query (#16299) +a37bba8086f5 [UR] Update tag to 6d4eec8c for UR#2272 and UR#2336 (#15965) +fe88f1dc4e61 UR [SYCL][CUDA][HIP] Update images enable variable (#16147) + Has something to do with disabling images by default +5d8a55236008 [UR][L0] Bump main tag to 39df0317 (#16365) +28e84168b5e4 [UR] Update tag to 58e4d76 (#16322) +8106796a4093 [UR] Update tag to 45f3d8 (#16327) +8a41b47be40d [SYCL][UR][L0] Fix issue with event caching causing profiling tag conflicts (#16233) +06e57374b23d [UR] Pull in fixes for issues raised in latest coverity scan. (#16158) +33a0411c0bbf [UR] Interrupt-based event implementation (#16252) +590960a4205f [UR][L0] Add Support for External Semaphores (#16162) +c3eb1603adb2 [UR][L0] Disabling Driver In Order Lists by default (#16263) +6fd5143b5431 [UR] Update UR tag to include the fix to set the execution flag for all kernels (#16241) +b56ffc5f7c98 [UR] Update tag to 3fdf7e3 for UR #2353 and #2413 (#16262) +130a901922bc [UR][L0] fix event caching (#16207) +73b99be383ac [UR] Bump UR tag to eb076da (#16216) + # Release notes Nov'24 Release notes for commit range @@ -107,8 +800,6 @@ Release notes for commit range for SYCL Matrix. intel/llvm#15351 intel/llvm#15932 intel/llvm#15547 - Added support for specialization constants on Native CPU. intel/llvm#14446 - Added support for atomic fence on Native CPU. intel/llvm#14619 -- Added a new overload for `joint_matrix_apply` to be able to return result - into a different matrix. intel/llvm#13153 - Added `max_work_group_size`and `max_linear_work_group_size` kernel properties to allow users to specify the maximum work-group size that a kernel will be invoked with. intel/llvm#14518