Skip to content

Commit

Permalink
Update from pytorch_pretrained_bert - HuggingFace (pytorch#40)
Browse files Browse the repository at this point in the history
* Update BERT examples

* Update GPT examples

* Add Transformer-XL

* fix branch to master on pytorch_pretrained_bert for Transformer-XL

* adding accelerator tag for Transformer-XL

* Add GPT-2

* Space in summary title

* cleaning

* Update huggingface_pytorch-pretrained-bert_transformerXL.md

* Update huggingface_pytorch-pretrained-bert_gpt2.md

* Update huggingface_pytorch-pretrained-bert_transformerXL.md

* bugfix

* bugfix

* bugfix
  • Loading branch information
VictorSanh authored and soumith committed Jun 19, 2019
1 parent 6ed75c1 commit 2c3b932
Show file tree
Hide file tree
Showing 4 changed files with 308 additions and 4 deletions.
95 changes: 94 additions & 1 deletion huggingface_pytorch-pretrained-bert_bert.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ pip install tqdm boto3 requests regex

### Example

Here is an example on how to tokenize the input text with `bertTokenizer`, and then get the hidden states computed by `bertModel` or predict masked tokens using `bertForMaskedLM`.
Here is an example on how to tokenize the input text with `bertTokenizer`, and then get the hidden states computed by `bertModel` or predict masked tokens using `bertForMaskedLM`. The example also includes snippets showcasing how to use `bertForNextSentencePrediction`, `bertForQuestionAnswering`, `bertForSequenceClassification`, `bertForMultipleChoice`, `bertForTokenClassification`, and `bertForPreTraining`.

```python
### First, tokenize the input
Expand Down Expand Up @@ -90,6 +90,99 @@ predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
assert predicted_token == 'Jim'
```

```python
### Classify next sentence using ``bertForNextSentencePrediction``
# Going back to our initial input
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = tokenizer.tokenize(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
tokens_tensor = torch.tensor([indexed_tokens])

nextSent_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForNextSentencePrediction', 'bert-base-cased')
nextSent_model.eval()

# Predict the next sentence classification logits
with torch.no_grad():
next_sent_classif_logits = nextSent_model(tokens_tensor, segments_tensors)
```

```python
### Classify next sentence using ``bertForNextSentencePrediction``
nextSent_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForNextSentencePrediction', 'bert-base-cased')
nextSent_model.eval()

# Predict the next sentence classification logits
with torch.no_grad():
next_sent_classif_logits = nextSent_model(tokens_tensor, segments_tensors)
```

```python
### Question answering using `bertForQuestionAnswering`
questionAnswering_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForQuestionAnswering', 'bert-base-cased')
questionAnswering_model.eval()

# Predict the start and end positions logits
with torch.no_grad():
start_logits, end_logits = questionAnswering_model(tokens_tensor, segments_tensors)

# Or get the total loss which is the sum of the CrossEntropy loss for the start and end token positions (set model to train mode before if used for training)
start_positions, end_positions = torch.tensor([12]), torch.tensor([14])
multiple_choice_loss = questionAnswering_model(tokens_tensor, segments_tensors, start_positions=start_positions, end_positions=end_positions)
```

```python
### Classify sequence using `bertForSequenceClassification`
# Load bertForSequenceClassification
seqClassification_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForSequenceClassification', 'bert-base-cased', num_labels=2)
seqClassification_model.eval()

# Predict the sequence classification logits
with torch.no_grad():
seq_classif_logits = seqClassification_model(tokens_tensor, segments_tensors)

# Or get the sequence classification loss (set model to train mode before if used for training)
labels = torch.tensor([1])
seq_classif_loss = seqClassification_model(tokens_tensor, segments_tensors, labels=labels)
```

```python
### Sequence tagging using `bertForTokenClassification`
tokClassification_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForTokenClassification', 'bert-base-cased', num_labels=2)
tokClassification_model.eval()
# Predict the token classification logits
with torch.no_grad():
classif_logits = model(tokens_tensor, segments_tensors)

# Or get the token classification loss (set model to train mode before if used for training)
labels = torch.tensor([[0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0]])
classif_loss = tokClassification_model(tokens_tensor, segments_tensors, labels=labels)
```

```python
### Select answer among multiple choice using `bertForMultipleChoice`
multiplChoice_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForMultipleChoice', 'bert-base-cased', num_choices=2)
multiplChoice_model.eval()

tokens_tensor = torch.tensor([[indexed_tokens, indexed_tokens]])
segments_tensors = torch.tensor([[segments_ids, segments_ids]])

# Predict the multiple choice logits
with torch.no_grad():
multiple_choice_logits = multiplChoice_model(tokens_tensor, segments_tensors)

# Or get the multiple choice loss (set model to train mode before if used for training)
labels = torch.tensor([1])
multiple_choice_loss = multiplChoice_model(tokens_tensor, segments_tensors, labels=labels)
```

```python
### Fine-tune BERT using `bertForPreTraining`
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])

forPretraining_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForPreTraining', 'bert-base-cased')
masked_lm_logits_scores, seq_relationship_logits = forPretraining_model(tokens_tensor, segments_tensors)
```

### Resources

Expand Down
26 changes: 23 additions & 3 deletions huggingface_pytorch-pretrained-bert_gpt.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,18 +27,19 @@ It includes:

### Requirements

Unlike most other PyTorch Hub models, BERT requires a few additional Python packages to be installed.
Unlike most other PyTorch Hub models, GPT requires a few additional Python packages to be installed.

```bash
pip install tqdm boto3 requests regex
pip install tqdm boto3 requests regex ftfy spacy
```

### Example

Here is an example on how to tokenize the text with `openAIGPTTokenizer`, and then get the hidden states computed by `openAIGPTModel` or predict the next token using `openAIGPTLMHeadModel`.
Here is an example on how to tokenize the text with `openAIGPTTokenizer`, and then get the hidden states computed by `openAIGPTModel` or predict the next token using `openAIGPTLMHeadModel`. Finally, we showcase how to use `openAIGPTDoubleHeadsModel` to combine the language modeling head and a multiple choice classification head.

```python
### First, tokenize the input
#############################
import torch
tokenizer = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTTokenizer', 'openai-gpt')

Expand All @@ -48,15 +49,19 @@ tokenized_text = tokenizer.tokenize(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
tokens_tensor = torch.tensor([indexed_tokens])


### Get the hidden states computed by `openAIGPTModel`
######################################################
model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTModel', 'openai-gpt')
model.eval()

# Compute hidden states features for each layer
with torch.no_grad():
hidden_states = model(tokens_tensor)


### Predict the next token using `openAIGPTLMHeadModel`
#######################################################
lm_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTLMHeadModel', 'openai-gpt')
lm_model.eval()

Expand All @@ -68,6 +73,21 @@ with torch.no_grad():
predicted_index = torch.argmax(predictions[0, -1, :]).item()
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
assert predicted_token == '.</w>'


### Language modeling and multiple choice classification `openAIGPTDoubleHeadsModel`
####################################################################################
double_head_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTDoubleHeadsModel', 'openai-gpt')
double_head_model.eval() # Set the model to train mode if used for training

text_bis = "Who was Jim Henson ? Jim Henson was a mysterious young man"
tokenized_text_bis = tokenizer.tokenize(text_bis)
indexed_tokens_bis = tokenizer.convert_tokens_to_ids(tokenized_text_bis)
tokens_tensor = torch.tensor([[indexed_tokens, indexed_tokens_bis]])
mc_token_ids = torch.LongTensor([[len(tokenized_text)-1, len(tokenized_text_bis)-1]])

with torch.no_grad():
lm_logits, multiple_choice_logits = double_head_model(tokens_tensor, mc_token_ids)
```

### Requirement
Expand Down
105 changes: 105 additions & 0 deletions huggingface_pytorch-pretrained-bert_gpt2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
layout: hub_detail
background-class: hub-background
body-class: hub
title: GPT-2
summary: Language Models are Unsupervised Multitask Learners
category: researchers
image: huggingface-logo.png
author: HuggingFace Team
tags: [nlp]
github-link: https://github.com/huggingface/pytorch-pretrained-BERT.git
featured_image_1: no-image
featured_image_2: no-image
accelerator: cuda-optional
order: 10
---

### Model Description

GPT-2 was released together with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford by Alec Radford et al at OpenAI. It is a development of [GPT](https://github.com/pytorch/hub/blob/master/huggingface_pytorch-pretrained-bert_gpt.md) introduced in [Improving Language Understanding by Generative Pre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf). It further asseses the impressive natural language generation abilities of large language models along with the ability to perform reasonably well on a diverse range of tasks in a zero-shot setting.

Here are three models based on OpenAI's pre-trained weights along with the associated Tokenizer.
It includes:
- `gpt2Model`: raw OpenAI GPT-2 Transformer model (fully pre-trained)
- `gpt2LMHeadModel`: OpenAI GPT-2 Transformer with the tied language modeling head on top (fully pre-trained)
- `gpt2DoubleHeadsModel`: OpenAI GPT-2 Transformer with the tied language modeling head and a multiple choice classification head on top (OpenAI GPT-2 Transformer is pre-trained, the multiple choice classification head is only initialized and has to be trained)

Note that two versions of GPT-2 are available for use: the small version (`gpt2`: English model with 12-layer, 768-hidden, 12-heads, 117M parameters) and the medium version (`gpt2-medium`: English model with 24-layer, 1024-hidden, 16-heads, 345M parameters).

### Requirements

Unlike most other PyTorch Hub models, GPT requires a few additional Python packages to be installed.

```bash
pip install tqdm boto3 requests regex
```

Using `python3` is recommended to use these models especially regarding the use of the tokenizer.

### Example

Here is an example on how to tokenize the text with `gpt2Tokenizer`, and then get the hidden states computed by `gpt2Model` or predict the next token using `gpt2LMHeadModel`. Finally, we showcase how to use `gpt2DoubleHeadsModel` to combine the language modeling head and a multiple choice classification head.

```python
### First, tokenize the input
#############################
import torch
tokenizer = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'gpt2Tokenizer', 'gpt2')

# Prepare tokenized input
text_1 = "Who was Jim Henson ? Jim Henson was a puppeteer"
text_2 = "Who was Jim Henson ? Jim Henson was a mysterious young man"
tokenized_text_1 = tokenizer.tokenize(text_1)
tokenized_text_2 = tokenizer.tokenize(text_2)
indexed_tokens1 = tokenizer.convert_tokens_to_ids(tokenized_text_1)
indexed_tokens2 = tokenizer.convert_tokens_to_ids(tokenized_text_2)
tokens_tensor_1 = torch.tensor([indexed_tokens1])
tokens_tensor_2 = torch.tensor([indexed_tokens2])


### Get the hidden states computed by `gpt2Model`
#################################################
model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'gpt2Model', 'gpt2')
model.eval()

# Predict hidden states features for each layer
# past can be used to reuse precomputed hidden state in a subsequent predictions
with torch.no_grad():
hidden_states_1, past = model(tokens_tensor_1)
hidden_states_2, past = model(tokens_tensor_2, past=past)


### Predict the next token using `gpt2LMHeadModel`
##################################################
lm_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'gpt2LMHeadModel', 'gpt2')
lm_model.eval()

# Predict hidden states features for each layer
with torch.no_grad():
predictions_1, past = lm_model(tokens_tensor_1)
predictions_2, past = lm_model(tokens_tensor_2, past=past)

# Get the predicted last token
predicted_index = torch.argmax(predictions_2[0, -1, :]).item()
predicted_token = tokenizer.decode([predicted_index])
assert predicted_token == ' who'


### Language modeling and multiple choice classification `gpt2DoubleHeadsModel`
###############################################################################
double_head_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'gpt2DoubleHeadsModel', 'gpt2')
double_head_model.eval() # Set the model to train mode if used for training

tokens_tensor = torch.tensor([[indexed_tokens1, indexed_tokens2]])
mc_token_ids = torch.LongTensor([[len(tokenized_text_1) - 1, len(tokenized_text_2) - 1]])

with torch.no_grad():
lm_logits, multiple_choice_logits, presents = double_head_model(tokens_tensor, mc_token_ids)
```

### Resources

- Paper: [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/)
- [Blogpost from OpenAI](https://openai.com/blog/better-language-models/)
- Initial repository (with detailed examples and documentation): [pytorch-pretrained-BERT](https://github.com/huggingface/pytorch-pretrained-BERT)
86 changes: 86 additions & 0 deletions huggingface_pytorch-pretrained-bert_transformerXL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
layout: hub_detail
background-class: hub-background
body-class: hub
title: Transformer-XL
summary: Attentive Language Models Beyond a Fixed-Length Context
category: researchers
image: huggingface-logo.png
author: HuggingFace Team
tags: [nlp]
github-link: https://github.com/huggingface/pytorch-pretrained-BERT.git
featured_image_1: no-image
featured_image_2: no-image
accelerator: cuda-optional
order: 10
---

### Model Description

Transformer-XL was released together with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](http://arxiv.org/abs/1901.02860) by Zihang Dai, et al. This PyTorch implementation of Transformer-XL is an adaptation of the original [PyTorch implementation](https://github.com/kimiyoung/transformer-xl) which has been slightly modified to match the performances of the TensorFlow implementation and allow to re-use the pretrained weights.

Here are two models based on the author's pre-trained weights along with the associated Tokenizer.
It includes:
- `transformerXLModel`: Transformer-XL model which outputs the last hidden state and memory cells (fully pre-trained)
- `transformerXLLMHeadModel`: Transformer-XL with the tied adaptive softmax head on top for language modeling which outputs the logits/loss and memory cells (fully pre-trained)

### Requirements

Unlike most other PyTorch Hub models, Transformer-XL requires a few additional Python packages to be installed.

```bash
pip install tqdm boto3 requests regex
```

### Example

Here is an example on how to tokenize the text with `transformerXLTokenizer`, and then get the hidden states computed by `transformerXLModel` or predict the next token using `transformerXLLMHeadModel`.

```python
### First, tokenize the input
#############################
import torch
tokenizer = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'transformerXLTokenizer', 'transfo-xl-wt103')

# Prepare tokenized input
text_1 = "Who was Jim Henson ?"
text_2 = "Jim Henson was a puppeteer"
tokenized_text_1 = tokenizer.tokenize(text_1)
tokenized_text_2 = tokenizer.tokenize(text_2)
indexed_tokens_1 = tokenizer.convert_tokens_to_ids(tokenized_text_1)
indexed_tokens_2 = tokenizer.convert_tokens_to_ids(tokenized_text_2)
tokens_tensor_1 = torch.tensor([indexed_tokens_1])
tokens_tensor_2 = torch.tensor([indexed_tokens_2])

### Get the hidden states computed by `transformerXLModel`
##########################################################
model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'transformerXLModel', 'transfo-xl-wt103')
model.eval()

# Predict hidden states features for each layer
# past can be used to reuse precomputed hidden state in a subsequent predictions
with torch.no_grad():
hidden_states_1, mems_1 = model(tokens_tensor_1)
hidden_states_2, mems_2 = model(tokens_tensor_2, mems=mems_1)

### Predict the next token using `transformerXLLMHeadModel`
###########################################################
lm_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'transformerXLLMHeadModel', 'transfo-xl-wt103')
lm_model.eval()

# Predict hidden states features for each layer
with torch.no_grad():
predictions_1, mems_1 = lm_model(tokens_tensor_1)
predictions_2, mems_2 = lm_model(tokens_tensor_2, mems=mems_1)

# Get the predicted last token
predicted_index = torch.argmax(predictions_2[0, -1, :]).item()
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
assert predicted_token == 'who'
```

### Resources

- Paper: [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](http://arxiv.org/abs/1901.02860)
- Initial repository (with detailed examples and documentation): [pytorch-pretrained-BERT](https://github.com/huggingface/pytorch-pretrained-BERT)
- Original author's [implementation](https://github.com/kimiyoung/transformer-xl)

0 comments on commit 2c3b932

Please sign in to comment.