Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks. It provides reference implementations of various sequence-to-sequence models, including:
- Convolutional Neural Networks (CNN)
- LightConv and DynamicConv models
- Long Short-Term Memory (LSTM) networks
- Transformer (self-attention) networks
Fairseq features:
- multi-GPU (distributed) training on one machine or across multiple machines
- fast generation on both CPU and GPU with multiple search algorithms implemented:
- beam search
- Diverse Beam Search (Vijayakumar et al., 2016)
- sampling (unconstrained and top-k)
- large mini-batch training even on a single GPU via delayed updates
- fast half-precision floating point (FP16) training
- extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
We also provide pre-trained models for several benchmark translation and language modeling datasets.
- A PyTorch installation
- For training new models, you'll also need an NVIDIA GPU and NCCL
- Python version 3.6
Currently fairseq requires PyTorch version >= 1.0.0. Please follow the instructions here: https://github.com/pytorch/pytorch#installation.
If you use Docker make sure to increase the shared memory size either with
--ipc=host
or --shm-size
as command line options to nvidia-docker run
.
After PyTorch is installed, you can install fairseq with:
pip install -r requirements.txt
python setup.py build develop
The full documentation contains instructions for getting started, training new models and extending fairseq with new model types and tasks.
We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below, as well as example training and evaluation commands.
- Translation: convolutional and transformer models are available
- Language Modeling: convolutional models are available
We also have more detailed READMEs to reproduce results from specific papers:
- Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions
- Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning
- Fan et al. (2018): Hierarchical Neural Story Generation
- Ott et al. (2018): Scaling Neural Machine Translation
- Gehring et al. (2017): Convolutional Sequence to Sequence Learning
- Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks
- Facebook page: https://www.facebook.com/groups/fairseq.users
- Google group: https://groups.google.com/forum/#!forum/fairseq-users
fairseq(-py) is BSD-licensed. The license applies to the pre-trained models as well. We also provide an additional patent grant.
This is a PyTorch version of fairseq, a sequence-to-sequence learning toolkit from Facebook AI Research. The original authors of this reimplementation are (in no particular order) Sergey Edunov, Myle Ott, and Sam Gross.