Skip to content

Latest commit

 

History

History
89 lines (67 loc) · 2.88 KB

CHANGELOG.md

File metadata and controls

89 lines (67 loc) · 2.88 KB

Changelog

[0.14.4] - 2022-10-31

Added

  • Python 3.10 and 3.11 wheels!
    • Only for supported torch versions.
  • Support torch 1.13.
  • Tiled NA2D for 3x3 kernels.

Changed

  • Minor changes to the setup script to fix pip install natten.

[0.14.2] - 2022-10-15

Added

  • CPU support!
    • CPP backend for CPU computation.
    • CPU-only builds now supported.
    • Note we only have naive kernels for CPU at the moment. Feel free to open a PR!

Changed

  • Refactored the CPP/CUDA backend.
  • Unit tests for NA1D and NA2D
    • Gradcheck tests in slow and fast mode
    • Gradcheck tests for CPU backend
    • Allclose tests between CPU and CUDA outputs and gradients

[0.14.1] - 2022-10-08

Added

  • NATTEN is now available through PyPI and comes with pre-compiled wheels.

Changed

  • Refactored the CPP/CUDA backend.
  • Dropped LegacyNeighborhoodAttention (pure torch implementation) in favor of the upcoming CPU implementation.

[0.13] - 2022-09-29

Added

  • Added dilation support to all kernels, including tiled NA kernels.

Changed

  • Renamed NeighborhoodAttention to NeighborhoodAttention2D in the python interface.
    • NeighborhoodAttention is now deprecated and will be removed in future versions.
  • Renamed NeighborhoodAttention to NeighborhoodAttention2D in the python interface.
    • NeighborhoodAttention is now deprecated and will be removed in future versions.

[0.12] - 2022-07-09

Added

  • Fixed the race condition in K-Backward and V-Backward kernels.
    • This was handled previously with atomic adds (non-deterministic).
    • Now the kernels compute inverse neighbors and compute gradients without a race condition.
  • "Tiled" Neighborhood Attention kernels for 5x5, 7x7, 9x9, 11x11, and 13x13 window sizes.
    • Applies only to QKRPB-Forward and A-Backward.
    • Only supports dim per head = 32 for now.
      • Try to keep your channels a multiple of 32.
  • Improved FP16 support.
    • Specific kernels for FP16 that use half2 addressing.
    • Different threads per block for FP16.
  • New 1D NA kernels
    • Slightly more efficient.
    • With FP16 support.
    • Arbitrary kernel sizes supported (there's no upper bound of 13x13 like the 2D version).

Changed

  • Window size templating, and some other common bits are factored out into a commons header file.
    • Makes extending support and editing easier.
  • Gradchecks now run in fast mode (faster and less memory usage) unless instructed otherwise.

[0.11a] - 2022-05-12

Added

  • 1D Neighborhood Attention

[0.11] - 2022-04-30

Changed

  • Classification and downstream kernels combined.
  • Refactored cuda extension to natten/.
  • Minor speed improvements.

[0.10]

Initial public release.