Skip to content

Commit

Permalink
Fused neighborhood attention (#111)
Browse files Browse the repository at this point in the history
* Fused neighborhood attention (FNA) kernels (forward pass only for now)
  * 1D, 2D and 3D Neighborhood Attention are supported,
  * Causal neighborhood attention is implemented,
* Window (kernel) size, dilation, and causality can be defined
*per-axis*,
  * All GPU architectures since Maxwell (SM50) are supported,
    * SM50 up to SM70 are SIMT-only, but support both FP16 and FP32,
* SM70 and SM75 target Tensor Cores in FP16, and SIMT-style in FP32, *
SM80 and above target Tensor Cores in FP16, BF16, and FP32.
* Relative positional biases are implemented (not defined for causal
masking yet),
* Memory layout in FNA is different from existing kernels (`[B, *,
heads, dim]` instead of `[B, heads, *, dim]`.)
* Eventually this layout can skip over the permute/explicit reshape step
in the attention module following the QKV projection.
* Naive kernels now implement and allow causal masking,
* Naive kernels (CPU and CUDA) now allow varying parameters (window
size, dilation, causal) across axes,
* Major bug fix in Volta GEMM kernels
* The epilogue was different for Volta, and it slipped through unit
tests,
  * Tests are now more aggressive, and the issue has been fixed.
* Minor torch bug fixed
* Streams were not being selected correctly if users set a tensor to a
device other than cuda:0. Thanks to @AdityaKane2001 for discovering it.
* Documentation (finally):
* Better late than never, but finally added more documentation and
reorganized docs under docs/ instead of shoving everything into the
readme.
* So much more that I forgot (in part due to lack of documentation).
  • Loading branch information
alihassanijr authored Mar 8, 2024
1 parent bdee155 commit 9b99173
Show file tree
Hide file tree
Showing 254 changed files with 133,543 additions and 55,740 deletions.
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,24 @@
# Changelog

## [Main branch]
* Fused neighborhood attention (FNA) kernels (forward pass only for now)
* 1D, 2D and 3D Neighborhood Attention are supported,
* Causal neighborhood attention is implemented,
* Window (kernel) size, dilation, and causality can be defined *per-axis*,
* All GPU architectures since Maxwell (SM50) are supported,
* SM50 up to SM70 are SIMT-only, but support both FP16 and FP32,
* SM70 and SM75 target Tensor Cores in FP16, and SIMT-style in FP32,
* SM80 and above target Tensor Cores in FP16, BF16, and FP32.
* Relative positional biases are implemented (not defined for causal masking yet),
* Memory layout in FNA is different from existing kernels (`[B, *, heads, dim]` instead of `[B, heads, *, dim]`.)
* Eventually this layout can skip over the permute/explicit reshape step in the attention module following
the QKV projection.
* Naive kernels now implement and allow causal masking,
* Naive kernels (CPU and CUDA) now allow varying parameters (window size, dilation, causal) across axes,
* Major bug fix in Volta GEMM kernels
* The epilogue was different for Volta, and it slipped through unit tests,
* Tests are now more aggressive, and the issue has been fixed.

## [0.15.1] - 2024-01-24
* Attention tensors can now be views, which allows combining neighborhood and any other attention pattern (i.e. registers,
cross attention tokens, and the like) without extra copies. ([#85](https://github.com/SHI-Labs/NATTEN/pull/85) and [#87](https://github.com/SHI-Labs/NATTEN/pull/87)).
Expand Down
34 changes: 34 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,37 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Fused Neighborhood Attention kernels are heavily based on the memory-efficient
attention kernels from the xFormers project by Meta Platforms, Inc.

Copyright (c) Facebook, Inc. and its affiliates

BSD 3-Clause License

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories America
and IDIAP Research Institute nor the names of its contributors may be
used to endorse or promote products derived from this software without
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ install:
NATTEN_CUDA_ARCH="${CUDA_ARCH}" NATTEN_N_WORKERS="${WORKERS}" NATTEN_VERBOSE="${VERBOSE}" pip install -v -e . 2>&1 | tee install.out

test:
pytest -v -x ./tests
PYTORCH_NO_CUDA_MEMORY_CACHING=1 pytest -v -x ./tests

style:
ufmt format $(check_dirs)
Expand Down
Loading

0 comments on commit 9b99173

Please sign in to comment.