Add documentation

johndpope · Sep 3, 2018 · 6381cc9 · 6381cc9
1 parent 0e101e9
commit 6381cc9
Show file tree

Hide file tree

Showing 44 changed files with 2,487 additions and 262 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,9 @@
 # Introduction
 
-Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks. It provides reference implementations of various sequence-to-sequence models, including:
+Fairseq(-py) is a sequence modeling toolkit that allows researchers and
+developers to train custom models for translation, summarization, language
+modeling and other text generation tasks. It provides reference implementations
+of various sequence-to-sequence models, including:
 - **Convolutional Neural Networks (CNN)**
   - [Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](https://arxiv.org/abs/1612.08083)
   - [Gehring et al. (2017): Convolutional Sequence to Sequence Learning](https://arxiv.org/abs/1705.03122)
@@ -16,10 +19,12 @@ Fairseq(-py) is a sequence modeling toolkit that allows researchers and develope
 Fairseq features:
 - multi-GPU (distributed) training on one machine or across multiple machines
 - fast beam search generation on both CPU and GPU
-- large mini-batch training (even on a single GPU) via delayed updates
+- large mini-batch training even on a single GPU via delayed updates
 - fast half-precision floating point (FP16) training
+- extensible: easily register new models, criterions, and tasks
 
-We also provide [pre-trained models](#pre-trained-models) for several benchmark translation datasets.
+We also provide [pre-trained models](#pre-trained-models) for several benchmark
+translation and language modeling datasets.
 
 ![Model](fairseq.gif)
 
@@ -31,112 +36,20 @@ We also provide [pre-trained models](#pre-trained-models) for several benchmark
 Currently fairseq requires PyTorch version >= 0.4.0.
 Please follow the instructions here: https://github.com/pytorch/pytorch#installation.
 
-If you use Docker make sure to increase the shared memory size either with `--ipc=host` or `--shm-size` as command line
-options to `nvidia-docker run`.
+If you use Docker make sure to increase the shared memory size either with
+`--ipc=host` or `--shm-size` as command line options to `nvidia-docker run`.
 
 After PyTorch is installed, you can install fairseq with:
 ```
 pip install -r requirements.txt
-python setup.py build
-python setup.py develop
+python setup.py build develop
 ```
 
-# Quick Start
+# Getting Started
 
-The following command-line tools are provided:
-* `python preprocess.py`: Data pre-processing: build vocabularies and binarize training data
-* `python train.py`: Train a new model on one or multiple GPUs
-* `python generate.py`: Translate pre-processed data with a trained model
-* `python interactive.py`: Translate raw text with a trained model
-* `python score.py`: BLEU scoring of generated translations against reference translations
-* `python eval_lm.py`: Language model evaluation
-
-## Evaluating Pre-trained Models
-First, download a pre-trained model along with its vocabularies:
-```
-$ curl https://s3.amazonaws.com/fairseq-py/models/wmt14.v2.en-fr.fconv-py.tar.bz2 | tar xvjf -
-```
-
-This model uses a [Byte Pair Encoding (BPE) vocabulary](https://arxiv.org/abs/1508.07909), so we'll have to apply the encoding to the source text before it can be translated.
-This can be done with the [apply_bpe.py](https://github.com/rsennrich/subword-nmt/blob/master/apply_bpe.py) script using the `wmt14.en-fr.fconv-cuda/bpecodes` file.
-`@@` is used as a continuation marker and the original text can be easily recovered with e.g. `sed s/@@ //g` or by passing the `--remove-bpe` flag to `generate.py`.
-Prior to BPE, input text needs to be tokenized using `tokenizer.perl` from [mosesdecoder](https://github.com/moses-smt/mosesdecoder).
-
-Let's use `python interactive.py` to generate translations interactively.
-Here, we use a beam size of 5:
-```
-$ MODEL_DIR=wmt14.en-fr.fconv-py
-$ python interactive.py \
- --path $MODEL_DIR/model.pt $MODEL_DIR \
- --beam 5
-| loading model(s) from wmt14.en-fr.fconv-py/model.pt
-| [en] dictionary: 44206 types
-| [fr] dictionary: 44463 types
-| Type the input sentence and press return:
-> Why is it rare to discover new marine mam@@ mal species ?
-O       Why is it rare to discover new marine mam@@ mal species ?
-H       -0.06429661810398102    Pourquoi est-il rare de découvrir de nouvelles espèces de mammifères marins ?
-A       0 1 3 3 5 6 6 8 8 8 7 11 12
-```
-
-This generation script produces four types of outputs: a line prefixed with *S* shows the supplied source sentence after applying the vocabulary; *O* is a copy of the original source sentence; *H* is the hypothesis along with an average log-likelihood; and *A* is the attention maxima for each word in the hypothesis, including the end-of-sentence marker which is omitted from the text.
-
-Check [below](#pre-trained-models) for a full list of pre-trained models available.
-
-## Training a New Model
-
-The following tutorial is for machine translation.
-For an example of how to use Fairseq for other tasks, such as [language modeling](examples/language_model/README.md), please see the `examples/` directory.
-
-### Data Pre-processing
-
-Fairseq contains example pre-processing scripts for several translation datasets: IWSLT 2014 (German-English), WMT 2014 (English-French) and WMT 2014 (English-German).
-To pre-process and binarize the IWSLT dataset:
-```
-$ cd examples/translation/
-$ bash prepare-iwslt14.sh
-$ cd ../..
-$ TEXT=examples/translation/iwslt14.tokenized.de-en
-$ python preprocess.py --source-lang de --target-lang en \
-  --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
-  --destdir data-bin/iwslt14.tokenized.de-en
-```
-This will write binarized data that can be used for model training to `data-bin/iwslt14.tokenized.de-en`.
-
-### Training
-Use `python train.py` to train a new model.
-Here a few example settings that work well for the IWSLT 2014 dataset:
-```
-$ mkdir -p checkpoints/fconv
-$ CUDA_VISIBLE_DEVICES=0 python train.py data-bin/iwslt14.tokenized.de-en \
-  --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \
-  --arch fconv_iwslt_de_en --save-dir checkpoints/fconv
-```
-
-By default, `python train.py` will use all available GPUs on your machine.
-Use the [CUDA_VISIBLE_DEVICES](http://acceleware.com/blog/cudavisibledevices-masking-gpus) environment variable to select specific GPUs and/or to change the number of GPU devices that will be used.
-
-Also note that the batch size is specified in terms of the maximum number of tokens per batch (`--max-tokens`).
-You may need to use a smaller value depending on the available GPU memory on your system.
-
-### Generation
-Once your model is trained, you can generate translations using `python generate.py` **(for binarized data)** or `python interactive.py` **(for raw text)**:
-```
-$ python generate.py data-bin/iwslt14.tokenized.de-en \
-  --path checkpoints/fconv/checkpoint_best.pt \
-  --batch-size 128 --beam 5
-  | [de] dictionary: 35475 types
-  | [en] dictionary: 24739 types
-  | data-bin/iwslt14.tokenized.de-en test 6750 examples
-  | model fconv
-  | loaded checkpoint trainings/fconv/checkpoint_best.pt
-  S-721   danke .
-  T-721   thank you .
-  ...
-```
-
-To generate translations with only a CPU, use the `--cpu` flag.
-BPE continuation markers can be removed with the `--remove-bpe` flag.
+The [full documentation](https://fairseq.readthedocs.io/) contains instructions
+for getting started, training new models and extending fairseq with new model
+types and tasks.
 
 # Pre-trained Models
 
@@ -185,68 +98,6 @@ $ python score.py --sys /tmp/gen.out.sys --ref /tmp/gen.out.ref
 BLEU4 = 40.83, 67.5/46.9/34.4/25.5 (BP=1.000, ratio=1.006, syslen=83262, reflen=82787)
 ```
 
-# Large mini-batch training with delayed updates
-
-The `--update-freq` option can be used to accumulate gradients from multiple mini-batches and delay updating,
-creating a larger effective batch size.
-Delayed updates can also improve training speed by reducing inter-GPU communication costs and by saving idle time caused by variance in workload across GPUs.
-See [Ott et al. (2018)](https://arxiv.org/abs/1806.00187) for more details.
-
-To train on a single GPU with an effective batch size that is equivalent to training on 8 GPUs:
-```
-CUDA_VISIBLE_DEVICES=0 python train.py --update-freq 8 (...)
-```
-
-# Training with half precision floating point (FP16)
-
-> Note: FP16 training requires a Volta GPU and CUDA 9.1 or greater
-
-Recent GPUs enable efficient half precision floating point computation, e.g., using [Nvidia Tensor Cores](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html).
-
-Fairseq supports FP16 training with the `--fp16` flag:
-```
-python train.py --fp16 (...)
-```
-
-# Distributed training
-
-Distributed training in fairseq is implemented on top of [torch.distributed](http://pytorch.org/docs/master/distributed.html).
-Training begins by launching one worker process per GPU.
-These workers discover each other via a unique host and port (required) that can be used to establish an initial connection.
-Additionally, each worker has a rank, that is a unique number from 0 to n-1 where n is the total number of GPUs.
-
-If you run on a cluster managed by [SLURM](https://slurm.schedmd.com/) you can train a large English-French model on the WMT 2014 dataset on 16 nodes with 8 GPUs each (in total 128 GPUs) using this command:
-
-```
-$ DATA=...   # path to the preprocessed dataset, must be visible from all nodes
-$ PORT=9218  # any available TCP port that can be used by the trainer to establish initial connection
-$ sbatch --job-name fairseq-py --gres gpu:8 --cpus-per-task 10 \
-    --nodes 16 --ntasks-per-node 8 \
-    --wrap 'srun --output train.log.node%t --error train.stderr.node%t.%j \
-    python train.py $DATA \
-    --distributed-world-size 128 \
-    --distributed-port $PORT \
-    --force-anneal 50 --lr-scheduler fixed --max-epoch 55 \
-    --arch fconv_wmt_en_fr --optimizer nag --lr 0.1,4 --max-tokens 3000 \
-    --clip-norm 0.1 --dropout 0.1 --criterion label_smoothed_cross_entropy \
-    --label-smoothing 0.1 --wd 0.0001'
-```
-
-Alternatively you can manually start one process per GPU:
-```
-$ DATA=...  # path to the preprocessed dataset, must be visible from all nodes
-$ HOST_PORT=master.devserver.com:9218  # one of the hosts used by the job
-$ RANK=...  # the rank of this process, from 0 to 127 in case of 128 GPUs
-$ python train.py $DATA \
-    --distributed-world-size 128 \
-    --distributed-init-method 'tcp://$HOST_PORT' \
-    --distributed-rank $RANK \
-    --force-anneal 50 --lr-scheduler fixed --max-epoch 55 \
-    --arch fconv_wmt_en_fr --optimizer nag --lr 0.1,4 --max-tokens 3000 \
-    --clip-norm 0.1 --dropout 0.1 --criterion label_smoothed_cross_entropy \
-    --label-smoothing 0.1 --wd 0.0001
-```
-
 # Join the fairseq community
 
 * Facebook page: https://www.facebook.com/groups/fairseq.users
@@ -271,4 +122,8 @@ The license applies to the pre-trained models as well.
 We also provide an additional patent grant.
 
 # Credits
-This is a PyTorch version of [fairseq](https://github.com/facebookresearch/fairseq), a sequence-to-sequence learning toolkit from Facebook AI Research. The original authors of this reimplementation are (in no particular order) Sergey Edunov, Myle Ott, and Sam Gross.
+This is a PyTorch version of
+[fairseq](https://github.com/facebookresearch/fairseq), a sequence-to-sequence
+learning toolkit from Facebook AI Research. The original authors of this
+reimplementation are (in no particular order) Sergey Edunov, Myle Ott, and Sam
+Gross.
diff --git a/docs/Makefile b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = python -msphinx
+SPHINXPROJ    = fairseq
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/_static/theme_overrides.css b/docs/_static/theme_overrides.css
@@ -0,0 +1,9 @@
+.wy-table-responsive table td kbd {
+    white-space: nowrap;
+}
+.wy-table-responsive table td {
+    white-space: normal !important;
+}
+.wy-table-responsive {
+    overflow: visible !important;
+}
diff --git a/docs/command_line_tools.rst b/docs/command_line_tools.rst
@@ -0,0 +1,85 @@
+.. _Command-line Tools:
+
+Command-line Tools
+==================
+
+Fairseq provides several command-line tools for training and evaluating models:
+
+- :ref:`preprocess.py`: Data pre-processing: build vocabularies and binarize training data
+- :ref:`train.py`: Train a new model on one or multiple GPUs
+- :ref:`generate.py`: Translate pre-processed data with a trained model
+- :ref:`interactive.py`: Translate raw text with a trained model
+- :ref:`score.py`: BLEU scoring of generated translations against reference translations
+- :ref:`eval_lm.py`: Language model evaluation
+
+
+.. _preprocess.py:
+
+preprocess.py
+~~~~~~~~~~~~~
+.. automodule:: preprocess
+
+    .. argparse::
+        :module: preprocess
+        :func: get_parser
+        :prog: preprocess.py
+
+
+.. _train.py:
+
+train.py
+~~~~~~~~
+.. automodule:: train
+
+    .. argparse::
+        :module: fairseq.options
+        :func: get_training_parser
+        :prog: train.py
+
+
+.. _generate.py:
+
+generate.py
+~~~~~~~~~~~
+.. automodule:: generate
+
+    .. argparse::
+        :module: fairseq.options
+        :func: get_generation_parser
+        :prog: generate.py
+
+
+.. _interactive.py:
+
+interactive.py
+~~~~~~~~~~~~~~
+.. automodule:: interactive
+
+    .. argparse::
+        :module: fairseq.options
+        :func: get_interactive_generation_parser
+        :prog: interactive.py
+
+
+.. _score.py:
+
+score.py
+~~~~~~~~
+.. automodule:: score
+
+    .. argparse::
+        :module: score
+        :func: get_parser
+        :prog: score.py
+
+
+.. _eval_lm.py:
+
+eval_lm.py
+~~~~~~~~~~
+.. automodule:: eval_lm
+
+    .. argparse::
+        :module: fairseq.options
+        :func: get_eval_lm_parser
+        :prog: eval_lm.py