Skip to content

Latest commit

 

History

History
102 lines (79 loc) · 4.93 KB

huggingface_pytorch-pretrained-bert_bert.md

File metadata and controls

102 lines (79 loc) · 4.93 KB
layout background-class body-class title summary category image author tags github-link featured_image_1 featured_image_2
pytorch_hub_detail
pytorch-hub-background
pytorch-hub
BERT
BERT models
researchers
huggingface-logo.png
HuggingFace Team
NLP
BERT
no-image
no-image

Model Description

BERT was released together with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin et al. The model is based on the Transformer architecture introduced in Attention Is All You Need by Ashish Vaswani et al and has led to significant improvements on a wide range of downstream tasks.

Here are 8 models based on BERT with Google's pre-trained models along with the associated Tokenizer. It includes:

  • bertTokenizer: perform end-to-end tokenization, i.e. basic tokenization followed by WordPiece tokenization
  • bertModel: raw BERT Transformer model (fully pre-trained)
  • bertForMaskedLM: BERT Transformer with the pre-trained masked language modeling head on top (fully pre-trained)
  • bertForNextSentencePrediction: BERT Transformer with the pre-trained next sentence prediction classifier on top (fully pre-trained)
  • bertForPreTraining: BERT Transformer with masked language modeling head and next sentence prediction classifier on top (fully pre-trained)
  • bertForSequenceClassification: BERT Transformer with a sequence classification head on top (BERT Transformer is pre-trained, the sequence classification head is only initialized and has to be trained)
  • bertForMultipleChoice: BERT Transformer with a multiple choice head on top (used for task like Swag) (BERT Transformer is pre-trained, the multiple choice classification head is only initialized and has to be trained)
  • bertForTokenClassification: BERT Transformer with a token classification head on top (BERT Transformer is pre-trained, the token classification head is only initialized and has to be trained)
  • bertForQuestionAnswering: BERT Transformer with a token classification head on top (BERT Transformer is pre-trained, the token classification head is only initialized and has to be trained)

Example:

Here are three examples on how to use bertTokenizer, bertModel and bertForMaskedLM.

First, we prepare the inputs by tokenizing the text.

# Load pre-trained model tokenizer (vocabulary)
import torch
tokenizer = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertTokenizer', 'bert-base-cased', do_basic_tokenize=False)

# Tokenized input
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = tokenizer.tokenize(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)

We can get the hidden states computed by bertModel.

import torch
tokenizer = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertTokenizer', 'bert-base-cased', do_basic_tokenize=False)
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = tokenizer.tokenize(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)

# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]

# Convert inputs to PyTorch tensors
segments_tensors = torch.tensor([segments_ids])
tokens_tensor = torch.tensor([indexed_tokens])

model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertModel', 'bert-base-cased')
model.eval()

with torch.no_grad():
    encoded_layers, _ = model(tokens_tensor, segments_tensors)

We can predict masked tokens using bertForMaskedLM.

import torch
tokenizer = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertTokenizer', 'bert-base-cased', do_basic_tokenize=False)
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = tokenizer.tokenize(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)

# Mask a token that we will try to predict back with `BertForMaskedLM`
masked_index = 8
tokenized_text[masked_index] = '[MASK]'
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
tokens_tensor = torch.tensor([indexed_tokens])

model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'bertForMaskedLM', 'bert-base-cased')
model.eval()

with torch.no_grad():
    predictions = model(tokens_tensor, segments_tensors)

# Get the predicted token
predicted_index = torch.argmax(predictions[0, masked_index]).item()
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]

Resources: