forked from midjourney/transformers
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Nystromformer (huggingface#14659)
* Initial commit * Config and modelling changes Added Nystromformer-specific attributes to config and removed all decoder functionality from modelling. * Modelling and test changes Added Nystrom approximation and removed decoder tests. * Code quality fixes * Modeling changes and conversion script Initial commits to conversion script, modeling changes. * Minor modeling changes and conversion script * Modeling changes * Correct modeling, add tests and documentation * Code refactor * Remove tokenizers * Code refactor * Update __init__.py * Fix bugs * Update src/transformers/__init__.py Co-authored-by: NielsRogge <[email protected]> * Update src/transformers/__init__.py Co-authored-by: NielsRogge <[email protected]> * Update src/transformers/models/nystromformer/__init__.py Co-authored-by: NielsRogge <[email protected]> * Update docs/source/model_doc/nystromformer.mdx Co-authored-by: NielsRogge <[email protected]> * Update src/transformers/models/nystromformer/configuration_nystromformer.py Co-authored-by: NielsRogge <[email protected]> * Update src/transformers/models/nystromformer/configuration_nystromformer.py Co-authored-by: NielsRogge <[email protected]> * Update src/transformers/models/nystromformer/configuration_nystromformer.py Co-authored-by: NielsRogge <[email protected]> * Update src/transformers/models/nystromformer/configuration_nystromformer.py Co-authored-by: NielsRogge <[email protected]> * Update src/transformers/models/nystromformer/convert_nystromformer_original_pytorch_checkpoint_to_pytorch.py Co-authored-by: NielsRogge <[email protected]> * Update src/transformers/models/nystromformer/configuration_nystromformer.py Co-authored-by: NielsRogge <[email protected]> * Update modeling and test_modeling * Code refactor * .rst to .mdx * doc changes * Doc changes * Update modeling_nystromformer.py * Doc changes * Fix copies * Apply suggestions from code review Co-authored-by: NielsRogge <[email protected]> * Apply suggestions from code review Co-authored-by: NielsRogge <[email protected]> * Update configuration_nystromformer.py * Fix copies * Update tests/test_modeling_nystromformer.py Co-authored-by: NielsRogge <[email protected]> * Update test_modeling_nystromformer.py * Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> * Fix code style * Update modeling_nystromformer.py * Update modeling_nystromformer.py * Fix code style * Reformat modeling file * Update modeling_nystromformer.py * Modify NystromformerForMultipleChoice * Fix code quality * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * Code style changes and torch.no_grad() * make style * Apply suggestions from code review Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>
- Loading branch information
1 parent
444ea95
commit 28e0914
Showing
17 changed files
with
1,970 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
<!--Copyright 2022 The HuggingFace Team. All rights reserved. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations under the License. | ||
--> | ||
|
||
# Nyströmformer | ||
|
||
## Overview | ||
|
||
The Nyströmformer model was proposed in [*Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention*](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn | ||
Fung, Yin Li, and Vikas Singh. | ||
|
||
The abstract from the paper is the following: | ||
|
||
*Transformers have emerged as a powerful tool for a broad range of natural language processing tasks. A key component | ||
that drives the impressive performance of Transformers is the self-attention mechanism that encodes the influence or | ||
dependence of other tokens on each specific token. While beneficial, the quadratic complexity of self-attention on the | ||
input sequence length has limited its application to longer sequences -- a topic being actively studied in the | ||
community. To address this limitation, we propose Nyströmformer -- a model that exhibits favorable scalability as a | ||
function of sequence length. Our idea is based on adapting the Nyström method to approximate standard self-attention | ||
with O(n) complexity. The scalability of Nyströmformer enables application to longer sequences with thousands of | ||
tokens. We perform evaluations on multiple downstream tasks on the GLUE benchmark and IMDB reviews with standard | ||
sequence length, and find that our Nyströmformer performs comparably, or in a few cases, even slightly better, than | ||
standard self-attention. On longer sequence tasks in the Long Range Arena (LRA) benchmark, Nyströmformer performs | ||
favorably relative to other efficient self-attention methods. Our code is available at this https URL.* | ||
|
||
This model was contributed by [novice03](https://huggingface.co/novice03). The original code can be found [here](https://github.com/mlpen/Nystromformer). | ||
|
||
## NystromformerConfig | ||
|
||
[[autodoc]] NystromformerConfig | ||
|
||
## NystromformerModel | ||
|
||
[[autodoc]] NystromformerModel | ||
- forward | ||
|
||
## NystromformerForMaskedLM | ||
|
||
[[autodoc]] NystromformerForMaskedLM | ||
- forward | ||
|
||
## NystromformerForSequenceClassification | ||
|
||
[[autodoc]] NystromformerForSequenceClassification | ||
- forward | ||
|
||
## NystromformerForMultipleChoice | ||
|
||
[[autodoc]] NystromformerForMultipleChoice | ||
- forward | ||
|
||
## NystromformerForTokenClassification | ||
|
||
[[autodoc]] NystromformerForTokenClassification | ||
- forward | ||
|
||
## NystromformerForQuestionAnswering | ||
|
||
[[autodoc]] NystromformerForQuestionAnswering | ||
- forward |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -76,6 +76,7 @@ | |
mobilebert, | ||
mpnet, | ||
mt5, | ||
nystromformer, | ||
openai, | ||
pegasus, | ||
perceiver, | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.