Update from pytorch_pretrained_bert - HuggingFace (pytorch#40)

* Update BERT examples * Update GPT examples * Add Transformer-XL * fix branch to master on pytorch_pretrained_bert for Transformer-XL * adding accelerator tag for Transformer-XL * Add GPT-2 * Space in summary title * cleaning * Update huggingface_pytorch-pretrained-bert_transformerXL.md * Update huggingface_pytorch-pretrained-bert_gpt2.md * Update huggingface_pytorch-pretrained-bert_transformerXL.md * bugfix * bugfix * bugfix
jianyuh · Jun 19, 2019 · 2c3b932 · 2c3b932
1 parent 6ed75c1
commit 2c3b932
Show file tree

Hide file tree

Showing 4 changed files with 308 additions and 4 deletions.
diff --git a/huggingface_pytorch-pretrained-bert_bert.md b/huggingface_pytorch-pretrained-bert_bert.md
@@ -41,7 +41,7 @@ pip install tqdm boto3 requests regex
 
 ### Example
 
-Here is an example on how to tokenize the input text with `bertTokenizer`, and then get the hidden states computed by `bertModel` or predict masked tokens using `bertForMaskedLM`.
+Here is an example on how to tokenize the input text with `bertTokenizer`, and then get the hidden states computed by `bertModel` or predict masked tokens using `bertForMaskedLM`. The example also includes snippets showcasing how to use `bertForNextSentencePrediction`, `bertForQuestionAnswering`, `bertForSequenceClassification`, `bertForMultipleChoice`, `bertForTokenClassification`, and `bertForPreTraining`.
 
 ```python
 ### First, tokenize the input
@@ -90,6 +90,99 @@ predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
 assert predicted_token == 'Jim'
 ```
 
+```python
+### Classify next sentence using ``bertForNextSentencePrediction``
+# Going back to our initial input
+text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
+tokenized_text = tokenizer.tokenize(text)
+indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
+tokens_tensor = torch.tensor([indexed_tokens])
+
+nextSent_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForNextSentencePrediction', 'bert-base-cased')
+nextSent_model.eval()
+
+# Predict the next sentence classification logits
+with torch.no_grad():
+    next_sent_classif_logits = nextSent_model(tokens_tensor, segments_tensors)
+```
+
+```python
+### Classify next sentence using ``bertForNextSentencePrediction``
+nextSent_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForNextSentencePrediction', 'bert-base-cased')
+nextSent_model.eval()
+
+# Predict the next sentence classification logits
+with torch.no_grad():
+    next_sent_classif_logits = nextSent_model(tokens_tensor, segments_tensors)
+```
+
+```python
+### Question answering using `bertForQuestionAnswering`
+questionAnswering_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForQuestionAnswering', 'bert-base-cased')
+questionAnswering_model.eval()
+
+# Predict the start and end positions logits
+with torch.no_grad():
+    start_logits, end_logits = questionAnswering_model(tokens_tensor, segments_tensors)
+
+# Or get the total loss which is the sum of the CrossEntropy loss for the start and end token positions (set model to train mode before if used for training)
+start_positions, end_positions = torch.tensor([12]), torch.tensor([14])
+multiple_choice_loss = questionAnswering_model(tokens_tensor, segments_tensors, start_positions=start_positions, end_positions=end_positions)
+```
+
+```python
+### Classify sequence using `bertForSequenceClassification`
+# Load bertForSequenceClassification
+seqClassification_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForSequenceClassification', 'bert-base-cased', num_labels=2)
+seqClassification_model.eval()
+
+# Predict the sequence classification logits
+with torch.no_grad():
+    seq_classif_logits = seqClassification_model(tokens_tensor, segments_tensors)
+
+# Or get the sequence classification loss (set model to train mode before if used for training)
+labels = torch.tensor([1])
+seq_classif_loss = seqClassification_model(tokens_tensor, segments_tensors, labels=labels)
+```
+
+```python
+### Sequence tagging using `bertForTokenClassification`
+tokClassification_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForTokenClassification', 'bert-base-cased', num_labels=2)
+tokClassification_model.eval()
+# Predict the token classification logits
+with torch.no_grad():
+    classif_logits = model(tokens_tensor, segments_tensors)
+
+# Or get the token classification loss (set model to train mode before if used for training)
+labels = torch.tensor([[0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0]])
+classif_loss = tokClassification_model(tokens_tensor, segments_tensors, labels=labels)
+```
+
+```python
+### Select answer among multiple choice using `bertForMultipleChoice`
+multiplChoice_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForMultipleChoice', 'bert-base-cased', num_choices=2)
+multiplChoice_model.eval()
+
+tokens_tensor = torch.tensor([[indexed_tokens, indexed_tokens]])
+segments_tensors = torch.tensor([[segments_ids, segments_ids]])
+
+# Predict the multiple choice logits
+with torch.no_grad():
+    multiple_choice_logits = multiplChoice_model(tokens_tensor, segments_tensors)
+
+# Or get the multiple choice loss (set model to train mode before if used for training)
+labels = torch.tensor([1])
+multiple_choice_loss = multiplChoice_model(tokens_tensor, segments_tensors, labels=labels)
+```
+
+```python
+### Fine-tune BERT using `bertForPreTraining`
+tokens_tensor = torch.tensor([indexed_tokens])
+segments_tensors = torch.tensor([segments_ids])
+
+forPretraining_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForPreTraining', 'bert-base-cased')
+masked_lm_logits_scores, seq_relationship_logits = forPretraining_model(tokens_tensor, segments_tensors)
+```
 
 ### Resources
 

diff --git a/huggingface_pytorch-pretrained-bert_gpt.md b/huggingface_pytorch-pretrained-bert_gpt.md
@@ -27,18 +27,19 @@ It includes:
 
 ### Requirements
 
-Unlike most other PyTorch Hub models, BERT requires a few additional Python packages to be installed.
+Unlike most other PyTorch Hub models, GPT requires a few additional Python packages to be installed.
 
 ```bash
-pip install tqdm boto3 requests regex
+pip install tqdm boto3 requests regex ftfy spacy
 ```
 
 ### Example
 
-Here is an example on how to tokenize the text with `openAIGPTTokenizer`, and then get the hidden states computed by `openAIGPTModel` or predict the next token using `openAIGPTLMHeadModel`.
+Here is an example on how to tokenize the text with `openAIGPTTokenizer`, and then get the hidden states computed by `openAIGPTModel` or predict the next token using `openAIGPTLMHeadModel`. Finally, we showcase how to use `openAIGPTDoubleHeadsModel` to combine the language modeling head and a multiple choice classification head.
 
 ```python
 ### First, tokenize the input
+#############################
 import torch
 tokenizer = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTTokenizer', 'openai-gpt')
 
@@ -48,15 +49,19 @@ tokenized_text = tokenizer.tokenize(text)
 indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
 tokens_tensor = torch.tensor([indexed_tokens])
 
+
 ### Get the hidden states computed by `openAIGPTModel`
+######################################################
 model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTModel', 'openai-gpt')
 model.eval()
 
 # Compute hidden states features for each layer
 with torch.no_grad():
 	hidden_states = model(tokens_tensor)
 
+
 ### Predict the next token using `openAIGPTLMHeadModel`
+#######################################################
 lm_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTLMHeadModel', 'openai-gpt')
 lm_model.eval()
 
@@ -68,6 +73,21 @@ with torch.no_grad():
 predicted_index = torch.argmax(predictions[0, -1, :]).item()
 predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
 assert predicted_token == '.</w>'
+
+
+### Language modeling and multiple choice classification `openAIGPTDoubleHeadsModel`
+####################################################################################
+double_head_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTDoubleHeadsModel', 'openai-gpt')
+double_head_model.eval() # Set the model to train mode if used for training
+
+text_bis = "Who was Jim Henson ? Jim Henson was a mysterious young man"
+tokenized_text_bis = tokenizer.tokenize(text_bis)
+indexed_tokens_bis = tokenizer.convert_tokens_to_ids(tokenized_text_bis)
+tokens_tensor = torch.tensor([[indexed_tokens, indexed_tokens_bis]])
+mc_token_ids = torch.LongTensor([[len(tokenized_text)-1, len(tokenized_text_bis)-1]])
+
+with torch.no_grad():
+    lm_logits, multiple_choice_logits = double_head_model(tokens_tensor, mc_token_ids)
 ```
 
 ### Requirement

diff --git a/huggingface_pytorch-pretrained-bert_gpt2.md b/huggingface_pytorch-pretrained-bert_gpt2.md
@@ -0,0 +1,105 @@
+---
+layout: hub_detail
+background-class: hub-background
+body-class: hub
+title: GPT-2
+summary:  Language Models are Unsupervised Multitask Learners
+category: researchers
+image: huggingface-logo.png
+author: HuggingFace Team
+tags: [nlp]
+github-link: https://github.com/huggingface/pytorch-pretrained-BERT.git
+featured_image_1: no-image
+featured_image_2: no-image
+accelerator: cuda-optional
+order: 10
+---
+
+### Model Description
+
+GPT-2 was released together with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford by Alec Radford et al at OpenAI. It is a development of [GPT](https://github.com/pytorch/hub/blob/master/huggingface_pytorch-pretrained-bert_gpt.md) introduced in [Improving Language Understanding by Generative Pre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf). It further asseses the impressive natural language generation abilities of large language models along with the ability to perform reasonably well on a diverse range of tasks in a zero-shot setting.
+
+Here are three models based on OpenAI's pre-trained weights along with the associated Tokenizer.
+It includes:
+- `gpt2Model`: raw OpenAI GPT-2 Transformer model (fully pre-trained)
+- `gpt2LMHeadModel`: OpenAI GPT-2 Transformer with the tied language modeling head on top (fully pre-trained)
+- `gpt2DoubleHeadsModel`: OpenAI GPT-2 Transformer with the tied language modeling head and a multiple choice classification head on top (OpenAI GPT-2 Transformer is pre-trained, the multiple choice classification head is only initialized and has to be trained)
+
+Note that two versions of GPT-2 are available for use: the small version (`gpt2`: English model with 12-layer, 768-hidden, 12-heads, 117M parameters) and the medium version (`gpt2-medium`: English model with 24-layer, 1024-hidden, 16-heads, 345M parameters).
+
+### Requirements
+
+Unlike most other PyTorch Hub models, GPT requires a few additional Python packages to be installed.
+
+```bash
+pip install tqdm boto3 requests regex
+```
+
+Using `python3` is recommended to use these models especially regarding the use of the tokenizer.
+
+### Example
+
+Here is an example on how to tokenize the text with `gpt2Tokenizer`, and then get the hidden states computed by `gpt2Model` or predict the next token using `gpt2LMHeadModel`. Finally, we showcase how to use `gpt2DoubleHeadsModel` to combine the language modeling head and a multiple choice classification head.
+
+```python
+### First, tokenize the input
+#############################
+import torch
+tokenizer = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'gpt2Tokenizer', 'gpt2')
+
+#  Prepare tokenized input
+text_1 = "Who was Jim Henson ? Jim Henson was a puppeteer"
+text_2 = "Who was Jim Henson ? Jim Henson was a mysterious young man"
+tokenized_text_1 = tokenizer.tokenize(text_1)
+tokenized_text_2 = tokenizer.tokenize(text_2)
+indexed_tokens1 = tokenizer.convert_tokens_to_ids(tokenized_text_1)
+indexed_tokens2 = tokenizer.convert_tokens_to_ids(tokenized_text_2)
+tokens_tensor_1 = torch.tensor([indexed_tokens1])
+tokens_tensor_2 = torch.tensor([indexed_tokens2])
+
+
+### Get the hidden states computed by `gpt2Model`
+#################################################
+model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'gpt2Model', 'gpt2')
+model.eval()
+
+# Predict hidden states features for each layer
+# past can be used to reuse precomputed hidden state in a subsequent predictions
+with torch.no_grad():
+	hidden_states_1, past = model(tokens_tensor_1)
+	hidden_states_2, past = model(tokens_tensor_2, past=past)
+
+
+### Predict the next token using `gpt2LMHeadModel`
+##################################################
+lm_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'gpt2LMHeadModel', 'gpt2')
+lm_model.eval()
+
+# Predict hidden states features for each layer
+with torch.no_grad():
+	predictions_1, past = lm_model(tokens_tensor_1)
+	predictions_2, past = lm_model(tokens_tensor_2, past=past)
+
+# Get the predicted last token
+predicted_index = torch.argmax(predictions_2[0, -1, :]).item()
+predicted_token = tokenizer.decode([predicted_index])
+assert predicted_token == ' who'
+
+
+### Language modeling and multiple choice classification `gpt2DoubleHeadsModel`
+###############################################################################
+double_head_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'gpt2DoubleHeadsModel', 'gpt2')
+double_head_model.eval() # Set the model to train mode if used for training
+
+tokens_tensor = torch.tensor([[indexed_tokens1, indexed_tokens2]])
+mc_token_ids = torch.LongTensor([[len(tokenized_text_1) - 1, len(tokenized_text_2) - 1]])
+
+with torch.no_grad():
+    lm_logits, multiple_choice_logits, presents = double_head_model(tokens_tensor, mc_token_ids)
+```
+
+### Resources
+
+ - Paper: [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/)
+ - [Blogpost from OpenAI](https://openai.com/blog/better-language-models/)
+ - Initial repository (with detailed examples and documentation): [pytorch-pretrained-BERT](https://github.com/huggingface/pytorch-pretrained-BERT)
diff --git a/huggingface_pytorch-pretrained-bert_transformerXL.md b/huggingface_pytorch-pretrained-bert_transformerXL.md
@@ -0,0 +1,86 @@
+---
+layout: hub_detail
+background-class: hub-background
+body-class: hub
+title: Transformer-XL
+summary: Attentive Language Models Beyond a Fixed-Length Context
+category: researchers
+image: huggingface-logo.png
+author: HuggingFace Team
+tags: [nlp]
+github-link: https://github.com/huggingface/pytorch-pretrained-BERT.git
+featured_image_1: no-image
+featured_image_2: no-image
+accelerator: cuda-optional
+order: 10
+---
+
+### Model Description
+
+Transformer-XL was released together with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](http://arxiv.org/abs/1901.02860) by Zihang Dai, et al. This PyTorch implementation of Transformer-XL is an adaptation of the original [PyTorch implementation](https://github.com/kimiyoung/transformer-xl) which has been slightly modified to match the performances of the TensorFlow implementation and allow to re-use the pretrained weights.
+
+Here are two models based on the author's pre-trained weights along with the associated Tokenizer.
+It includes:
+- `transformerXLModel`: Transformer-XL model which outputs the last hidden state and memory cells (fully pre-trained)
+- `transformerXLLMHeadModel`: Transformer-XL with the tied adaptive softmax head on top for language modeling which outputs the logits/loss and memory cells (fully pre-trained)
+
+### Requirements
+
+Unlike most other PyTorch Hub models, Transformer-XL requires a few additional Python packages to be installed.
+
+```bash
+pip install tqdm boto3 requests regex
+```
+
+### Example
+
+Here is an example on how to tokenize the text with `transformerXLTokenizer`, and then get the hidden states computed by `transformerXLModel` or predict the next token using `transformerXLLMHeadModel`.
+
+```python
+### First, tokenize the input
+#############################
+import torch
+tokenizer = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'transformerXLTokenizer', 'transfo-xl-wt103')
+
+#  Prepare tokenized input
+text_1 = "Who was Jim Henson ?"
+text_2 = "Jim Henson was a puppeteer"
+tokenized_text_1 = tokenizer.tokenize(text_1)
+tokenized_text_2 = tokenizer.tokenize(text_2)
+indexed_tokens_1 = tokenizer.convert_tokens_to_ids(tokenized_text_1)
+indexed_tokens_2 = tokenizer.convert_tokens_to_ids(tokenized_text_2)
+tokens_tensor_1 = torch.tensor([indexed_tokens_1])
+tokens_tensor_2 = torch.tensor([indexed_tokens_2])
+
+### Get the hidden states computed by `transformerXLModel`
+##########################################################
+model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'transformerXLModel', 'transfo-xl-wt103')
+model.eval()
+
+# Predict hidden states features for each layer
+# past can be used to reuse precomputed hidden state in a subsequent predictions
+with torch.no_grad():
+	hidden_states_1, mems_1 = model(tokens_tensor_1)
+	hidden_states_2, mems_2 = model(tokens_tensor_2, mems=mems_1)
+
+### Predict the next token using `transformerXLLMHeadModel`
+###########################################################
+lm_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'transformerXLLMHeadModel', 'transfo-xl-wt103')
+lm_model.eval()
+
+# Predict hidden states features for each layer
+with torch.no_grad():
+	predictions_1, mems_1 = lm_model(tokens_tensor_1)
+	predictions_2, mems_2 = lm_model(tokens_tensor_2, mems=mems_1)
+
+# Get the predicted last token
+predicted_index = torch.argmax(predictions_2[0, -1, :]).item()
+predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
+assert predicted_token == 'who'
+```
+
+### Resources
+
+ - Paper: [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](http://arxiv.org/abs/1901.02860)
+ - Initial repository (with detailed examples and documentation): [pytorch-pretrained-BERT](https://github.com/huggingface/pytorch-pretrained-BERT)
+ - Original author's [implementation](https://github.com/kimiyoung/transformer-xl)