Skip to content

Latest commit

 

History

History
154 lines (124 loc) · 8.38 KB

CHANGELOG.md

File metadata and controls

154 lines (124 loc) · 8.38 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

TBD

Fixed

  • fMHA: Fixed BW pass on Sm86/Sm89 GPUs when K > 64 (RTX 3090, RTX 4090, A6000, ..) [facebookresearch#631]

Added

[0.0.16] - 2023-01-31

Fixed

Added

[0.0.15] - Skipped

[0.0.14] - 2022-11-10

Fixed

  • fMHA/CUTLASS: The current CUDA stream is now used by the kernel [facebookresearch#491]
  • fMHA/CUTLASS: Improve overall performance

Added

  • SwiGLU: Added xformers.ops.SwiGLU and its functional counterpart (xformers.ops.swiglu) [facebookresearch#490]
  • fMHA: Possible to combine CUTLASS's forward with flash-attention's backward pass [facebookresearch#469] - improves performance on A100 for K = 128
  • fMHA: Add custom xformers.ops.unbind operator to avoid a cat in the attention block [facebookresearch#458]

[0.0.13] - 2022-09-26

Added

  • fMHA: Added CUTLASS-based kernel for xformers.ops.memory_efficient_attention. This kernel is automatically depending on the inputs, and works on any GPU after P100 [facebookresearch#362]

[0.0.12] - 2022-08-08

Fixed

Added

[0.0.11] - 2022-05-30

Fixed

Added

[0.0.10] - 2022-03-14

Fixed

Added

[0.0.9] - 2022-02-09

Added

Fixed

[0.0.8] - 2022-01-07

Fixed

Added

[0.0.7] - 2021-11-30

Fixed

[0.0.6] - 2021-11-24

Fixed

Added

[0.0.4] - 2021-11-16

Fixed

Added

[0.0.3] - 2021-11-01

Fixed

[0.0.2] - 2021-11-01

Fixed

Added