Skip to content

Commit

Permalink
Docstring check (huggingface#26052)
Browse files Browse the repository at this point in the history
* Fix number of minimal calls to the Hub with peft integration

* Alternate design

* And this way?

* Revert

* Nits to fix

* Add util

* Print when changes are made

* Add list to ignore

* Add more rules

* Manual fixes

* deal with kwargs

* deal with enum defaults

* avoid many digits for floats

* Manual fixes

* Fix regex

* Fix regex

* Auto fix

* Style

* Apply script

* Add ignored list

* Add check that templates are filled

* Adding to CI checks

* Add back semi-fix

* Ignore more objects

* More auto-fixes

* Ignore missing objects

* Remove temp semi-fix

* Fixes

* Update src/transformers/models/pvt/configuration_pvt.py

Co-authored-by: Arthur <[email protected]>

* Update utils/check_docstrings.py

Co-authored-by: Arthur <[email protected]>

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Arthur <[email protected]>

* Deal with float defaults

* Fix small defaults

* Address review comment

* Treat

* Post-rebase cleanup

* Address review comment

* Update src/transformers/models/deprecated/mctct/configuration_mctct.py

Co-authored-by: Lysandre Debut <[email protected]>

* Address review comment

---------

Co-authored-by: Arthur <[email protected]>
Co-authored-by: Lysandre Debut <[email protected]>
  • Loading branch information
3 people authored Oct 4, 2023
1 parent 122b265 commit 03af4c4
Show file tree
Hide file tree
Showing 155 changed files with 1,821 additions and 488 deletions.
1 change: 1 addition & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,7 @@ jobs:
- run: make deps_table_check_updated
- run: python utils/update_metadata.py --check-only
- run: python utils/check_task_guides.py
- run: python utils/check_docstrings.py

workflows:
version: 2
Expand Down
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ repo-consistency:
python utils/check_doctest_list.py
python utils/update_metadata.py --check-only
python utils/check_task_guides.py
python utils/check_docstrings.py

# this target runs checks on all files

Expand Down Expand Up @@ -82,6 +83,7 @@ fix-copies:
python utils/check_dummies.py --fix_and_overwrite
python utils/check_doctest_list.py --fix_and_overwrite
python utils/check_task_guides.py --fix_and_overwrite
python utils/check_docstrings.py --fix_and_overwrite

# Run tests for the library

Expand Down
1 change: 1 addition & 0 deletions docs/source/en/pr_checks.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ This checks that:
- The translations of the READMEs and the index of the doc have the same model list as the main README (performed by `utils/check_copies.py`)
- The auto-generated tables in the documentation are up to date (performed by `utils/check_table.py`)
- The library has all objects available even if not all optional dependencies are installed (performed by `utils/check_dummies.py`)
- All docstrings properly document the arguments in the signature of the object (performed by `utils/check_docstrings.py`)

Should this check fail, the first two items require manual fixing, the last four can be fixed automatically for you by running the command

Expand Down
1 change: 1 addition & 0 deletions src/transformers/configuration_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@


class PretrainedConfig(PushToHubMixin):
# no-format
r"""
Base class for all configuration classes. Handles a few parameters common to all models' configurations as well as
methods for loading/downloading/saving configurations.
Expand Down
10 changes: 5 additions & 5 deletions src/transformers/data/data_collator.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ class DefaultDataCollator(DataCollatorMixin):
helpful if you need to set a return_tensors value at initialization.
Args:
return_tensors (`str`):
return_tensors (`str`, *optional*, defaults to `"pt"`):
The type of Tensor to return. Allowable values are "np", "pt" and "tf".
"""

Expand Down Expand Up @@ -235,7 +235,7 @@ class DataCollatorWithPadding:
This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >=
7.5 (Volta).
return_tensors (`str`):
return_tensors (`str`, *optional*, defaults to `"pt"`):
The type of Tensor to return. Allowable values are "np", "pt" and "tf".
"""

Expand Down Expand Up @@ -288,7 +288,7 @@ class DataCollatorForTokenClassification(DataCollatorMixin):
7.5 (Volta).
label_pad_token_id (`int`, *optional*, defaults to -100):
The id to use when padding the labels (-100 will be automatically ignore by PyTorch loss functions).
return_tensors (`str`):
return_tensors (`str`, *optional*, defaults to `"pt"`):
The type of Tensor to return. Allowable values are "np", "pt" and "tf".
"""

Expand Down Expand Up @@ -521,7 +521,7 @@ class DataCollatorForSeq2Seq:
Args:
tokenizer ([`PreTrainedTokenizer`] or [`PreTrainedTokenizerFast`]):
The tokenizer used for encoding the data.
model ([`PreTrainedModel`]):
model ([`PreTrainedModel`], *optional*):
The model that is being trained. If set and has the *prepare_decoder_input_ids_from_labels*, use it to
prepare the *decoder_input_ids*
Expand All @@ -544,7 +544,7 @@ class DataCollatorForSeq2Seq:
7.5 (Volta).
label_pad_token_id (`int`, *optional*, defaults to -100):
The id to use when padding the labels (-100 will be automatically ignored by PyTorch loss functions).
return_tensors (`str`):
return_tensors (`str`, *optional*, defaults to `"pt"`):
The type of Tensor to return. Allowable values are "np", "pt" and "tf".
"""

Expand Down
2 changes: 1 addition & 1 deletion src/transformers/feature_extraction_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ class BatchFeature(UserDict):
This class is derived from a python dictionary and can be used as a dictionary.
Args:
data (`dict`):
data (`dict`, *optional*):
Dictionary of lists/arrays/tensors returned by the __call__/pad methods ('input_values', 'attention_mask',
etc.).
tensor_type (`Union[None, str, TensorType]`, *optional*):
Expand Down
5 changes: 3 additions & 2 deletions src/transformers/generation/beam_constraints.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,8 +263,9 @@ class DisjunctiveConstraint(Constraint):
A special [`Constraint`] that is fulfilled by fulfilling just one of several constraints.
Args:
nested_token_ids (`List[List[int]]`): a list of words, where each word is a list of ids. This constraint
is fulfilled by generating just one from the list of words.
nested_token_ids (`List[List[int]]`):
A list of words, where each word is a list of ids. This constraint is fulfilled by generating just one from
the list of words.
"""

def __init__(self, nested_token_ids: List[List[int]]):
Expand Down
4 changes: 2 additions & 2 deletions src/transformers/generation/beam_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ class BeamSearchScorer(BeamScorer):
num_beam_hyps_to_keep (`int`, *optional*, defaults to 1):
The number of beam hypotheses that shall be returned upon calling
[`~transformer.BeamSearchScorer.finalize`].
num_beam_groups (`int`):
num_beam_groups (`int`, *optional*, defaults to 1):
Number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams.
See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details.
max_length (`int`, *optional*):
Expand Down Expand Up @@ -437,7 +437,7 @@ class ConstrainedBeamSearchScorer(BeamScorer):
num_beam_hyps_to_keep (`int`, *optional*, defaults to 1):
The number of beam hypotheses that shall be returned upon calling
[`~transformer.BeamSearchScorer.finalize`].
num_beam_groups (`int`):
num_beam_groups (`int`, *optional*, defaults to 1):
Number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams.
See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details.
max_length (`int`, *optional*):
Expand Down
1 change: 1 addition & 0 deletions src/transformers/generation/configuration_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@


class GenerationConfig(PushToHubMixin):
# no-format
r"""
Class that holds a configuration for a generation task. A `generate` call supports the following generation methods
for text-decoder, text-to-text, speech-to-text, and vision-to-text models:
Expand Down
4 changes: 2 additions & 2 deletions src/transformers/generation/flax_logits_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ class FlaxTopPLogitsWarper(FlaxLogitsWarper):
top_p (`float`):
If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or
higher are kept for generation.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
Expand Down Expand Up @@ -163,7 +163,7 @@ class FlaxTopKLogitsWarper(FlaxLogitsWarper):
Args:
top_k (`int`):
The number of highest probability vocabulary tokens to keep for top-k-filtering.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
Expand Down
25 changes: 11 additions & 14 deletions src/transformers/generation/logits_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,7 @@ class TopPLogitsWarper(LogitsWarper):
top_p (`float`):
If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or
higher are kept for generation.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
Expand Down Expand Up @@ -419,7 +419,7 @@ class TopKLogitsWarper(LogitsWarper):
Args:
top_k (`int`):
The number of highest probability vocabulary tokens to keep for top-k-filtering.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
Expand Down Expand Up @@ -447,9 +447,9 @@ class TypicalLogitsWarper(LogitsWarper):
Generation](https://arxiv.org/abs/2202.00666) for more information.
Args:
mass (`float`):
mass (`float`, *optional*, defaults to 0.9):
Value of typical_p between 0 and 1 inclusive, defaults to 0.9.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
Expand Down Expand Up @@ -499,7 +499,7 @@ class EpsilonLogitsWarper(LogitsWarper):
Args:
epsilon (`float`):
If set to > 0, only the most tokens with probabilities `epsilon` or higher are kept for generation.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
Expand Down Expand Up @@ -572,7 +572,7 @@ class EtaLogitsWarper(LogitsWarper):
epsilon (`float`):
A float value in the range (0, 1). Hyperparameter used to calculate the dynamic cutoff value, `eta`. The
suggested values from the paper ranges from 3e-4 to 4e-3 depending on the size of the model.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All values that are found to be below the dynamic cutoff value, `eta`, are set to this float value. This
parameter is useful when logits need to be modified for very low probability tokens that should be excluded
from generation entirely.
Expand Down Expand Up @@ -1600,18 +1600,15 @@ class UnbatchedClassifierFreeGuidanceLogitsProcessor(LogitsProcessor):
Higher guidance scale encourages the model to generate samples that are more closely linked to the input
prompt, usually at the expense of poorer quality. A value smaller than 1 has the opposite effect, while
making the negative prompt provided with negative_prompt_ids (if any) act as a positive prompt.
model (`PreTrainedModel`):
The model computing the unconditional scores. Supposedly the same as the one computing the conditional
scores. Both models must use the same tokenizer.
unconditional_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
Indices of input sequence tokens in the vocabulary for the unconditional branch. If unset, will default to
the last token of the prompt.
unconditional_attention_mask (`torch.LongTensor` of shape `(batch_size, sequence_length)`, **optional**):
unconditional_attention_mask (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
Attention mask for unconditional_ids.
model (`PreTrainedModel`):
The model computing the unconditional scores. Supposedly the same as the one computing the conditional
scores. Both models must use the same tokenizer.
smooth_factor (`float`, **optional**):
The interpolation weight for CFG Rescale. 1 means no rescaling, 0 reduces to the conditional scores without
CFG. Turn it lower if the output degenerates.
use_cache (`bool`, **optional**):
use_cache (`bool`, *optional*, defaults to `True`):
Whether to cache key/values during the negative prompt forward pass.
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/generation/stopping_criteria.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ class MaxLengthCriteria(StoppingCriteria):
Args:
max_length (`int`):
The maximum length that the output sequence can have in number of tokens.
max_position_embeddings (`int`, `optional`):
max_position_embeddings (`int`, *optional*):
The maximum model length, as defined by the model's `config.max_position_embeddings` attribute.
"""

Expand Down
4 changes: 2 additions & 2 deletions src/transformers/generation/tf_logits_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ class TFTopKLogitsWarper(TFLogitsWarper):
Args:
top_k (`int`):
The number of highest probability vocabulary tokens to keep for top-k-filtering.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
Expand Down Expand Up @@ -151,7 +151,7 @@ class TFTopPLogitsWarper(TFLogitsWarper):
top_p (`float`):
If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or
higher are kept for generation.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
Expand Down
4 changes: 2 additions & 2 deletions src/transformers/models/align/configuration_align.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ class AlignTextConfig(PretrainedConfig):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
pad_token_id (`int`, *optional*, defaults to 0):
Padding token id.
position_embedding_type (`str`, *optional*, defaults to `"absolute"`):
Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query"`. For
positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to
Expand All @@ -80,8 +82,6 @@ class AlignTextConfig(PretrainedConfig):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
pad_token_id (`int`, *optional*, defaults to 0)
Padding token id.
Example:
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/altclip/configuration_altclip.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ class AltCLIPConfig(PretrainedConfig):
Dictionary of configuration options used to initialize [`AltCLIPTextConfig`].
vision_config (`dict`, *optional*):
Dictionary of configuration options used to initialize [`AltCLIPVisionConfig`].
projection_dim (`int`, *optional*, defaults to 512):
projection_dim (`int`, *optional*, defaults to 768):
Dimentionality of text and vision projection layers.
logit_scale_init_value (`float`, *optional*, defaults to 2.6592):
The inital value of the *logit_scale* paramter. Default is used as per the original CLIP implementation.
Expand Down
4 changes: 2 additions & 2 deletions src/transformers/models/altclip/processing_altclip.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ class AltCLIPProcessor(ProcessorMixin):
the [`~AltCLIPProcessor.__call__`] and [`~AltCLIPProcessor.decode`] for more information.
Args:
image_processor ([`CLIPImageProcessor`]):
image_processor ([`CLIPImageProcessor`], *optional*):
The image processor is a required input.
tokenizer ([`XLMRobertaTokenizerFast`]):
tokenizer ([`XLMRobertaTokenizerFast`], *optional*):
The tokenizer is a required input.
"""
attributes = ["image_processor", "tokenizer"]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,15 @@ class ASTConfig(PretrainedConfig):
hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
patch_size (`int`, *optional*, defaults to `16`):
patch_size (`int`, *optional*, defaults to 16):
The size (resolution) of each patch.
qkv_bias (`bool`, *optional*, defaults to `True`):
Whether to add a bias to the queries, keys and values.
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/bark/processing_bark.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ class BarkProcessor(ProcessorMixin):
Args:
tokenizer ([`PreTrainedTokenizer`]):
An instance of [`PreTrainedTokenizer`].
speaker_embeddings (`Dict[Dict[str]]`, *optional*, defaults to `None`):
speaker_embeddings (`Dict[Dict[str]]`, *optional*):
Optional nested speaker embeddings dictionary. The first level contains voice preset names (e.g
`"en_speaker_4"`). The second level contains `"semantic_prompt"`, `"coarse_prompt"` and `"fine_prompt"`
embeddings. The values correspond to the path of the corresponding `np.ndarray`. See
Expand Down
2 changes: 0 additions & 2 deletions src/transformers/models/barthez/tokenization_barthez.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,6 @@ class BarthezTokenizer(PreTrainedTokenizer):
mask_token (`str`, *optional*, defaults to `"<mask>"`):
The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict.
additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`):
Additional special tokens used by the tokenizer.
sp_model_kwargs (`dict`, *optional*):
Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
Expand Down
2 changes: 0 additions & 2 deletions src/transformers/models/bartpho/tokenization_bartpho.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,6 @@ class BartphoTokenizer(PreTrainedTokenizer):
mask_token (`str`, *optional*, defaults to `"<mask>"`):
The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict.
additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`):
Additional special tokens used by the tokenizer.
sp_model_kwargs (`dict`, *optional*):
Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/beit/configuration_beit.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ class BeitConfig(PretrainedConfig):
[microsoft/beit-base-patch16-224-pt22k](https://huggingface.co/microsoft/beit-base-patch16-224-pt22k) architecture.
Args:
vocab_size (`int`, *optional*, defaults to 8092):
vocab_size (`int`, *optional*, defaults to 8192):
Vocabulary size of the BEiT model. Defines the number of different image tokens that can be used during
pre-training.
hidden_size (`int`, *optional*, defaults to 768):
Expand Down
Loading

0 comments on commit 03af4c4

Please sign in to comment.