Misc changes

Summary: Pull Request resolved: fairinternal/fairseq-py#840 Differential Revision: D16947645 Pulled By: myleott fbshipit-source-id: e869789bc22bbf5cb08d9adfa44f9fc09b3805af
zeroyou · Aug 22, 2019 · 3c2cf3b · 3c2cf3b
1 parent 93057cc
commit 3c2cf3b
Show file tree

Hide file tree

Showing 4 changed files with 15 additions and 166 deletions.
diff --git a/examples/language_model/README.md b/examples/language_model/README.md
@@ -12,7 +12,7 @@ Model | Description | Dataset | Download
 
 ## Example usage
 
-Sampling from a language model using PyTorch Hub:
+To sample from a language model using PyTorch Hub:
 ```python
 import torch
 
@@ -25,6 +25,12 @@ en_lm = torch.hub.load('pytorch/fairseq', 'transformer_lm.wmt19.en', tokenizer='
 # Sample from the language model
 en_lm.sample('Barack Obama', beam=1, sampling=True, sampling_topk=10, temperature=0.8)
 # "Barack Obama is coming to Sydney and New Zealand (...)"
+
+# The same interface can be used with custom models as well
+from fairseq.models.transformer_lm import TransformerLanguageModel
+custom_lm = TransformerLanguageModel.from_pretrained('/path/to/model/dir', 'checkpoint100.pt', tokenizer='moses', bpe='fastbpe')
+custom_lm.sample('Barack Obama', beam=5)
+# "Barack Obama (...)"
 ```
 
 ## Training a transformer language model with the CLI tools

diff --git a/examples/roberta/README.md b/examples/roberta/README.md
@@ -76,6 +76,13 @@ Model | Accuracy
 ---|---
 `roberta.large` | 78.1
 
+**[XNLI (Conneau et al., 2018)](https://arxiv.org/abs/1809.05053)**
+_(TRANSLATE-TEST)_
+
+Model | en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur
+---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---
+`roberta.large.mnli` | 91.3 | 82.91 | 84.27 | 81.24 | 81.74 | 83.13 | 78.28 | 76.79 | 76.64 | 74.17 | 74.05 | 77.5 | 70.9 | 66.65 | 66.81
+
 ## Example usage
 
 ##### Load RoBERTa from torch.hub (PyTorch >= 1.1):

diff --git a/examples/roberta/README.pretraining.md b/examples/roberta/README.pretraining.md
@@ -54,7 +54,7 @@ PEAK_LR=0.0005          # Peak learning rate, adjust as needed
 TOKENS_PER_SAMPLE=512   # Max sequence length
 MAX_POSITIONS=512       # Num. positional embeddings (usually same as above)
 MAX_SENTENCES=16        # Number of sequences per batch (batch size)
-UPDATE_FREQ=16           # Increase the batch size 16x
+UPDATE_FREQ=16          # Increase the batch size 16x
 
 DATA_DIR=data-bin/wikitext-103
 

diff --git a/fairseq/tasks/tagged_language_modeling.py b/fairseq/tasks/tagged_language_modeling.py