forked from pytorch/hub
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update from pytorch_pretrained_bert - HuggingFace (pytorch#40)
* Update BERT examples * Update GPT examples * Add Transformer-XL * fix branch to master on pytorch_pretrained_bert for Transformer-XL * adding accelerator tag for Transformer-XL * Add GPT-2 * Space in summary title * cleaning * Update huggingface_pytorch-pretrained-bert_transformerXL.md * Update huggingface_pytorch-pretrained-bert_gpt2.md * Update huggingface_pytorch-pretrained-bert_transformerXL.md * bugfix * bugfix * bugfix
- Loading branch information
1 parent
6ed75c1
commit 2c3b932
Showing
4 changed files
with
308 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
--- | ||
layout: hub_detail | ||
background-class: hub-background | ||
body-class: hub | ||
title: GPT-2 | ||
summary: Language Models are Unsupervised Multitask Learners | ||
category: researchers | ||
image: huggingface-logo.png | ||
author: HuggingFace Team | ||
tags: [nlp] | ||
github-link: https://github.com/huggingface/pytorch-pretrained-BERT.git | ||
featured_image_1: no-image | ||
featured_image_2: no-image | ||
accelerator: cuda-optional | ||
order: 10 | ||
--- | ||
|
||
### Model Description | ||
|
||
GPT-2 was released together with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford by Alec Radford et al at OpenAI. It is a development of [GPT](https://github.com/pytorch/hub/blob/master/huggingface_pytorch-pretrained-bert_gpt.md) introduced in [Improving Language Understanding by Generative Pre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf). It further asseses the impressive natural language generation abilities of large language models along with the ability to perform reasonably well on a diverse range of tasks in a zero-shot setting. | ||
|
||
Here are three models based on OpenAI's pre-trained weights along with the associated Tokenizer. | ||
It includes: | ||
- `gpt2Model`: raw OpenAI GPT-2 Transformer model (fully pre-trained) | ||
- `gpt2LMHeadModel`: OpenAI GPT-2 Transformer with the tied language modeling head on top (fully pre-trained) | ||
- `gpt2DoubleHeadsModel`: OpenAI GPT-2 Transformer with the tied language modeling head and a multiple choice classification head on top (OpenAI GPT-2 Transformer is pre-trained, the multiple choice classification head is only initialized and has to be trained) | ||
|
||
Note that two versions of GPT-2 are available for use: the small version (`gpt2`: English model with 12-layer, 768-hidden, 12-heads, 117M parameters) and the medium version (`gpt2-medium`: English model with 24-layer, 1024-hidden, 16-heads, 345M parameters). | ||
|
||
### Requirements | ||
|
||
Unlike most other PyTorch Hub models, GPT requires a few additional Python packages to be installed. | ||
|
||
```bash | ||
pip install tqdm boto3 requests regex | ||
``` | ||
|
||
Using `python3` is recommended to use these models especially regarding the use of the tokenizer. | ||
|
||
### Example | ||
|
||
Here is an example on how to tokenize the text with `gpt2Tokenizer`, and then get the hidden states computed by `gpt2Model` or predict the next token using `gpt2LMHeadModel`. Finally, we showcase how to use `gpt2DoubleHeadsModel` to combine the language modeling head and a multiple choice classification head. | ||
|
||
```python | ||
### First, tokenize the input | ||
############################# | ||
import torch | ||
tokenizer = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'gpt2Tokenizer', 'gpt2') | ||
|
||
# Prepare tokenized input | ||
text_1 = "Who was Jim Henson ? Jim Henson was a puppeteer" | ||
text_2 = "Who was Jim Henson ? Jim Henson was a mysterious young man" | ||
tokenized_text_1 = tokenizer.tokenize(text_1) | ||
tokenized_text_2 = tokenizer.tokenize(text_2) | ||
indexed_tokens1 = tokenizer.convert_tokens_to_ids(tokenized_text_1) | ||
indexed_tokens2 = tokenizer.convert_tokens_to_ids(tokenized_text_2) | ||
tokens_tensor_1 = torch.tensor([indexed_tokens1]) | ||
tokens_tensor_2 = torch.tensor([indexed_tokens2]) | ||
|
||
|
||
### Get the hidden states computed by `gpt2Model` | ||
################################################# | ||
model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'gpt2Model', 'gpt2') | ||
model.eval() | ||
|
||
# Predict hidden states features for each layer | ||
# past can be used to reuse precomputed hidden state in a subsequent predictions | ||
with torch.no_grad(): | ||
hidden_states_1, past = model(tokens_tensor_1) | ||
hidden_states_2, past = model(tokens_tensor_2, past=past) | ||
|
||
|
||
### Predict the next token using `gpt2LMHeadModel` | ||
################################################## | ||
lm_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'gpt2LMHeadModel', 'gpt2') | ||
lm_model.eval() | ||
|
||
# Predict hidden states features for each layer | ||
with torch.no_grad(): | ||
predictions_1, past = lm_model(tokens_tensor_1) | ||
predictions_2, past = lm_model(tokens_tensor_2, past=past) | ||
|
||
# Get the predicted last token | ||
predicted_index = torch.argmax(predictions_2[0, -1, :]).item() | ||
predicted_token = tokenizer.decode([predicted_index]) | ||
assert predicted_token == ' who' | ||
|
||
|
||
### Language modeling and multiple choice classification `gpt2DoubleHeadsModel` | ||
############################################################################### | ||
double_head_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'gpt2DoubleHeadsModel', 'gpt2') | ||
double_head_model.eval() # Set the model to train mode if used for training | ||
|
||
tokens_tensor = torch.tensor([[indexed_tokens1, indexed_tokens2]]) | ||
mc_token_ids = torch.LongTensor([[len(tokenized_text_1) - 1, len(tokenized_text_2) - 1]]) | ||
|
||
with torch.no_grad(): | ||
lm_logits, multiple_choice_logits, presents = double_head_model(tokens_tensor, mc_token_ids) | ||
``` | ||
|
||
### Resources | ||
|
||
- Paper: [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) | ||
- [Blogpost from OpenAI](https://openai.com/blog/better-language-models/) | ||
- Initial repository (with detailed examples and documentation): [pytorch-pretrained-BERT](https://github.com/huggingface/pytorch-pretrained-BERT) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
--- | ||
layout: hub_detail | ||
background-class: hub-background | ||
body-class: hub | ||
title: Transformer-XL | ||
summary: Attentive Language Models Beyond a Fixed-Length Context | ||
category: researchers | ||
image: huggingface-logo.png | ||
author: HuggingFace Team | ||
tags: [nlp] | ||
github-link: https://github.com/huggingface/pytorch-pretrained-BERT.git | ||
featured_image_1: no-image | ||
featured_image_2: no-image | ||
accelerator: cuda-optional | ||
order: 10 | ||
--- | ||
|
||
### Model Description | ||
|
||
Transformer-XL was released together with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](http://arxiv.org/abs/1901.02860) by Zihang Dai, et al. This PyTorch implementation of Transformer-XL is an adaptation of the original [PyTorch implementation](https://github.com/kimiyoung/transformer-xl) which has been slightly modified to match the performances of the TensorFlow implementation and allow to re-use the pretrained weights. | ||
|
||
Here are two models based on the author's pre-trained weights along with the associated Tokenizer. | ||
It includes: | ||
- `transformerXLModel`: Transformer-XL model which outputs the last hidden state and memory cells (fully pre-trained) | ||
- `transformerXLLMHeadModel`: Transformer-XL with the tied adaptive softmax head on top for language modeling which outputs the logits/loss and memory cells (fully pre-trained) | ||
|
||
### Requirements | ||
|
||
Unlike most other PyTorch Hub models, Transformer-XL requires a few additional Python packages to be installed. | ||
|
||
```bash | ||
pip install tqdm boto3 requests regex | ||
``` | ||
|
||
### Example | ||
|
||
Here is an example on how to tokenize the text with `transformerXLTokenizer`, and then get the hidden states computed by `transformerXLModel` or predict the next token using `transformerXLLMHeadModel`. | ||
|
||
```python | ||
### First, tokenize the input | ||
############################# | ||
import torch | ||
tokenizer = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'transformerXLTokenizer', 'transfo-xl-wt103') | ||
|
||
# Prepare tokenized input | ||
text_1 = "Who was Jim Henson ?" | ||
text_2 = "Jim Henson was a puppeteer" | ||
tokenized_text_1 = tokenizer.tokenize(text_1) | ||
tokenized_text_2 = tokenizer.tokenize(text_2) | ||
indexed_tokens_1 = tokenizer.convert_tokens_to_ids(tokenized_text_1) | ||
indexed_tokens_2 = tokenizer.convert_tokens_to_ids(tokenized_text_2) | ||
tokens_tensor_1 = torch.tensor([indexed_tokens_1]) | ||
tokens_tensor_2 = torch.tensor([indexed_tokens_2]) | ||
|
||
### Get the hidden states computed by `transformerXLModel` | ||
########################################################## | ||
model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'transformerXLModel', 'transfo-xl-wt103') | ||
model.eval() | ||
|
||
# Predict hidden states features for each layer | ||
# past can be used to reuse precomputed hidden state in a subsequent predictions | ||
with torch.no_grad(): | ||
hidden_states_1, mems_1 = model(tokens_tensor_1) | ||
hidden_states_2, mems_2 = model(tokens_tensor_2, mems=mems_1) | ||
|
||
### Predict the next token using `transformerXLLMHeadModel` | ||
########################################################### | ||
lm_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'transformerXLLMHeadModel', 'transfo-xl-wt103') | ||
lm_model.eval() | ||
|
||
# Predict hidden states features for each layer | ||
with torch.no_grad(): | ||
predictions_1, mems_1 = lm_model(tokens_tensor_1) | ||
predictions_2, mems_2 = lm_model(tokens_tensor_2, mems=mems_1) | ||
|
||
# Get the predicted last token | ||
predicted_index = torch.argmax(predictions_2[0, -1, :]).item() | ||
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0] | ||
assert predicted_token == 'who' | ||
``` | ||
|
||
### Resources | ||
|
||
- Paper: [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](http://arxiv.org/abs/1901.02860) | ||
- Initial repository (with detailed examples and documentation): [pytorch-pretrained-BERT](https://github.com/huggingface/pytorch-pretrained-BERT) | ||
- Original author's [implementation](https://github.com/kimiyoung/transformer-xl) |