layout | background-class | body-class | title | summary | category | image | author | tags | github-link | featured_image_1 | featured_image_2 | accelerator | order | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
hub_detail |
hub-background |
hub |
GPT |
Generative Pre-Training (GPT) models for language understanding |
researchers |
huggingface-logo.png |
HuggingFace Team |
|
GPT1.png |
no-image |
cuda-optional |
10 |
GPT was released together with the paper Improving Language Understanding by Generative Pre-Training by Alec Radford et al at OpenAI. It's a combination of two ideas: Transformer model and large scale unsupervised pre-training.
Here are three models based on OpenAI's pre-trained weights along with the associated Tokenizer. It includes:
openAIGPTModel
: raw OpenAI GPT Transformer model (fully pre-trained)openAIGPTLMHeadModel
: OpenAI GPT Transformer with the tied language modeling head on top (fully pre-trained)openAIGPTDoubleHeadsModel
: OpenAI GPT Transformer with the tied language modeling head and a multiple choice classification head on top (OpenAI GPT Transformer is pre-trained, the multiple choice classification head is only initialized and has to be trained)
Unlike most other PyTorch Hub models, GPT requires a few additional Python packages to be installed.
pip install tqdm boto3 requests regex ftfy spacy
Here is an example on how to tokenize the text with openAIGPTTokenizer
, and then get the hidden states computed by openAIGPTModel
or predict the next token using openAIGPTLMHeadModel
. Finally, we showcase how to use openAIGPTDoubleHeadsModel
to combine the language modeling head and a multiple choice classification head.
### First, tokenize the input
#############################
import torch
tokenizer = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTTokenizer', 'openai-gpt')
# Prepare tokenized input
text = "Who was Jim Henson ? Jim Henson was a puppeteer"
tokenized_text = tokenizer.tokenize(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
tokens_tensor = torch.tensor([indexed_tokens])
### Get the hidden states computed by `openAIGPTModel`
######################################################
model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTModel', 'openai-gpt')
model.eval()
# Compute hidden states features for each layer
with torch.no_grad():
hidden_states = model(tokens_tensor)
### Predict the next token using `openAIGPTLMHeadModel`
#######################################################
lm_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTLMHeadModel', 'openai-gpt')
lm_model.eval()
# Predict all tokens
with torch.no_grad():
predictions = lm_model(tokens_tensor)
# Get the last predicted token
predicted_index = torch.argmax(predictions[0, -1, :]).item()
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
assert predicted_token == '.</w>'
### Language modeling and multiple choice classification `openAIGPTDoubleHeadsModel`
####################################################################################
double_head_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTDoubleHeadsModel', 'openai-gpt')
double_head_model.eval() # Set the model to train mode if used for training
text_bis = "Who was Jim Henson ? Jim Henson was a mysterious young man"
tokenized_text_bis = tokenizer.tokenize(text_bis)
indexed_tokens_bis = tokenizer.convert_tokens_to_ids(tokenized_text_bis)
tokens_tensor = torch.tensor([[indexed_tokens, indexed_tokens_bis]])
mc_token_ids = torch.LongTensor([[len(tokenized_text)-1, len(tokenized_text_bis)-1]])
with torch.no_grad():
lm_logits, multiple_choice_logits = double_head_model(tokens_tensor, mc_token_ids)
The model only support python3.
- Paper: Improving Language Understanding by Generative Pre-Training
- Blogpost from OpenAI
- Initial repository (with detailed examples and documentation): pytorch-pretrained-BERT