Skip to content

Tags: Hiroshiba/fairseq

Tags

v0.10.2

Toggle v0.10.2's commit message

v0.10.1

Toggle v0.10.1's commit message
v0.10.0 -> v0.10.1

Bug fixes

v0.10.0

Toggle v0.10.0's commit message
v0.9.0 -> v0.10.0

It's been a long time since our last release (0.9.0) nearly a year ago! There
have been numerous changes and new features added since then, which we've tried
to summarize below. While this release carries the same major version as our
previous release (0.x.x), if you have code that relies on 0.9.0, it is likely
you'll need to adapt it before updating to 0.10.0.

Looking forward, this will also be the last significant release with the
0.x.x numbering. The next release will be 1.0.0 and will include a major
migration to the [Hydra configuration system](https://github.com/facebookresearch/hydra),
with an eye towards modularizing fairseq to be more usable as a library.

Changelog:

New papers:
- [Reducing Transformer Depth on Demand with Structured Dropout (Fan et al., 2019)](https://github.com/pytorch/fairseq/tree/master/examples/layerdrop/README.md)
- [MBART: Multilingual Denoising Pre-training for Neural Machine Translation ({Liu*,Gu*,Goyal*} et al., 2020)](https://github.com/pytorch/fairseq/blob/master/examples/mbart/README.md)
- [Neural Machine Translation with Byte-Level Subwords (Wang et al., 2019)](https://github.com/pytorch/fairseq/blob/master/examples/byte_level_bpe/README.md)
- [Training with Quantization Noise for Extreme Model Compression ({Fan*,Stock*} et al., 2019)](https://github.com/pytorch/fairseq/blob/master/examples/quant_noise/README.md)
- [Monotonic Multihead Attention (Ma et al., 2020)](https://github.com/pytorch/fairseq/blob/master/examples/simultaneous_translation/README.md)
- [Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)](examples/unsupervised_quality_estimation/README.md)
- [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)](https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md)
- [Lexically constrained decoding with dynamic beam allocation](examples/constrained_decoding/README.md)
- [Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models (Enarvi et al., 2020)](examples/pointer_generator/README.md)
- [Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)](examples/linformer/README.md)
- [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples/criss/README.md)
- [Deep Transformers with Latent Depth (Li et al., 2020)](examples/latent_depth/README.md)
- [Better Fine-Tuning by Reducing Representational Collapse (Aghajanyan et al. 2020)](examples/rxf/README.md)

Major new features:
- TorchScript support for Transformer and SequenceGenerator (PyTorch 1.6+ only)
- Model parallel training support (see [Megatron-11b](https://github.com/pytorch/fairseq/tree/master/examples/megatron_11b))
- TPU support via `--tpu` and `--bf16` options (7751229)
- Added [VizSeq (a visual analysis toolkit for evaluating fairseq models)](https://facebookresearch.github.io/vizseq/docs/getting_started/fairseq_example)
- Migrated to Python logging (fb76dac)
- Added “SlowMo” distributed training backend (0dac0ff)
- Added Optimizer State Sharding (ZeRO) (5d7ed6a)
- Added several features to improve speech recognition support in fairseq: CTC criterion, external ASR decoder support (currently only wav2letter decoder) with KenLM and fairseq language model fusion

Minor features:
- Added `--patience` for early stopping
- Added `--shorten-method=[none|truncate|random_crop]` to language modeling (and other) tasks
- Added `--eval-bleu` for computing BLEU scores during training (60fbf64)
- Added support for training huggingface models (e.g. `hf_gpt2`) (2728f9b)
- Added FusedLAMB optimizer (`--optimizer=lamb`) (f75411a)
- Added LSTM-based language model (`lstm_lm`) (9f4256e)
- Added dummy tasks and models for benchmarking (91f0534; a541b19)
- Added tutorial and pretrained models for paraphrasing (630701e)
- Support quantization for Transformer (6379573)
- Support multi-GPU validation in fairseq-validate (2f7e3f3)
- Support batched inference in hub interface (3b53962)
- Support for language model fusion in standard beam search (5379461)

Breaking changes:
- Updated requirements to Python 3.6+ and PyTorch 1.5+
- Main entry point scripts (eval_lm.py, generate.py, etc.) removed from root directory into `fairseq_cli`
- Changed format for generation output; `H-` now corresponds to tokenized system outputs and newly added `D-` lines correspond to detokenized outputs (f353913)
- We now log the stats from the log-interval (displayed as `train_inner`) instead of a rolling average over each epoch.
- SequenceGenerator/Scorer does not print alignment by default, re-enable with `--print-alignment`
- Print base 2 scores in generation scripts (660d69f)
- Incremental decoding interface changed to use `FairseqIncrementalState` (4e48c4a; 88185fc)
- Refactor namespaces in Criterions to support library usage (introduce `LegacyFairseqCriterion` for BC) (46b773a)
- Deprecate `FairseqCriterion::aggregate_logging_outputs` interface, use `FairseqCriterion::reduce_metrics` instead (8679339)
- Moved `fairseq.meters` to `fairseq.logging.meters` and added new metrics aggregation module (`fairseq.logging.metrics`) (1e324a5; f8b795f)
- Reset mid-epoch stats every log-interval steps (244835d)
- Ignore duplicate entries in dictionary files (dict.txt) and support manual overwrite with `#fairseq:overwrite` option (dd1298e; 937535d)
- Use 1-based indexing for epochs everywhere (aa79bb9)

Minor interface changes:
- Added `FairseqTask::begin_epoch` hook (122fc1d)
- `FairseqTask::build_generator` interface changed (cd2555a)
- Change `RobertaModel` base class to `FairseqEncoder` (307df56)
- Expose `FairseqOptimizer.param_groups` property (8340b2d)
- Deprecate `--fast-stat-sync` and replace with `FairseqCriterion::logging_outputs_can_be_summed` interface (fe6c2ed)
- `--raw-text` and `--lazy-load` are fully deprecated; use `--dataset-impl` instead
- Mixture of expert tasks moved to `examples/` (8845dcf)

Performance improvements:
- Use cross entropy from apex for improved memory efficiency (5065077)
- Added buffered dataloading (`--data-buffer-size`) (4115317)

v0.9.0

Toggle v0.9.0's commit message
v0.8.0 -> v0.9.0 (facebookresearch#1452)

Summary:
Possibly breaking changes:
- Set global numpy seed (4a7cd58)
- Split `in_proj_weight` into separate k, v, q projections in MultiheadAttention (fdf4c3e)
- TransformerEncoder returns namedtuples instead of dict (27568a7)

New features:
- Add `--fast-stat-sync` option (e1ba32a)
- Add `--empty-cache-freq` option (315c463)
- Support criterions with parameters (ba5f829)

New papers:
- Simple and Effective Noisy Channel Modeling for Neural Machine Translation (49177c9)
- Levenshtein Transformer (86857a5, ...)
- Cross+Self-Attention for Transformer Models (4ac2c5f)
- Jointly Learning to Align and Translate with Transformer Models (1c66792)
- Reducing Transformer Depth on Demand with Structured Dropout (dabbef4)
- Unsupervised Cross-lingual Representation Learning at Scale (XLM-RoBERTa) (e23e5ea)
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (a92bcda)
- CamemBERT: a French BERT (b31849a)

Speed improvements:
- Add CUDA kernels for LightConv and DynamicConv (f840564)
- Cythonization of various dataloading components (4fc3953, ...)
- Don't project mask tokens for MLM training (718677e)
Pull Request resolved: facebookresearch#1452

Differential Revision: D18798409

Pulled By: myleott

fbshipit-source-id: 860a0d5aaf7377c8c9bd63cdb3b33d464f0e1727

v0.8.0

Toggle v0.8.0's commit message
v0.7.2 -> v0.8.0 (facebookresearch#1017)

Summary:
Changelog:
- Relicensed under MIT license
- Add RoBERTa
- Add wav2vec
- Add WMT'19 models
- Add initial ASR code
- Changed torch.hub interface (`generate` renamed to `translate`)
- Add `--tokenizer` and `--bpe`
- f812e52: Renamed data.transforms -> data.encoders
- 654affc: New Dataset API (optional)
- `47fd985`: Deprecate old Masked LM components
- `5f78106`: Set mmap as default dataset format and infer format automatically
- Misc fixes for sampling
- Misc fixes to support PyTorch 1.2
Pull Request resolved: facebookresearch#1017

Differential Revision: D16799880

Pulled By: myleott

fbshipit-source-id: 45ad8bc531724a53063cbc24ca1c93f715cdc5a7

v0.7.2

Toggle v0.7.2's commit message
v0.7.1 -> v0.7.2 (facebookresearch#891)

Summary:
No major API changes since the last release. Cutting a new release since we'll be merging significant (possibly breaking) changes to logging, data loading and the masked LM implementation soon.
Pull Request resolved: facebookresearch#891

Differential Revision: D16377132

Pulled By: myleott

fbshipit-source-id: f1cb88e671ccd510e53334d0f449fe18585268c7

v0.7.1

Toggle v0.7.1's commit message
v0.7.1: fix PyPI setup and tests

Summary: Pull Request resolved: facebookresearch#818

Differential Revision: D15916265

Pulled By: myleott

fbshipit-source-id: c66c0bd988d3472c4150226952f34ee8d4c3db86

v0.7.0

Toggle v0.7.0's commit message
v0.7.0 (facebookresearch#817)

Summary:
Notable (possibly breaking) changes:
- d45db80: Remove checkpoint utility functions from utils.py into checkpoint_utils.py
- f2563c2: Move LM definitions into separate files
- dffb167: Updates to model API:
  - `FairseqModel` -> `FairseqEncoderDecoderModel`
  - add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
  - `encoder_out_dict` -> `encoder_out`
  - rm unused `remove_head` functions
- 34726d5: Move `distributed_init` into `DistributedFairseqModel`
- cf17068: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU)
- d45db80: Change default LR scheduler from `reduce_lr_on_plateau` to `fixed`
- 96ac28d: Rename `--sampling-temperature` -> `--temperature`
- fc1a19a: Deprecate dummy batches
- a1c997b: Add memory mapped datasets
- 0add50c: Allow cycling over multiple datasets, where each one becomes an "epoch"

Plus many additional features and bugfixes
Pull Request resolved: facebookresearch#817

Differential Revision: D15913844

Pulled By: myleott

fbshipit-source-id: d5b5d678efdd9dd3e4d7ca848ddcf1ec2b21bf6b

v0.6.2

Toggle v0.6.2's commit message
0.6.1 -> 0.6.2 (facebookresearch#577)

Summary:
Changelog:
- 998ba4f: Add language models from Baevski & Auli (2018)
- 4294c4f: Add mixture of experts code from Shen et al. (2019)
- 0049349: Add example for multilingual training
- 48d9afb: Speed improvements, including fused operators from apex
- 44d27e6: Add Tensorboard support
- d17fa85: Add Adadelta optimizer
- 9e1c880: Add `FairseqEncoderModel`
- b65c579: Add `FairseqTask.inference_step` to modularize generate.py
- 2ad1178: Add back `--curriculum`
- Misc bug fixes and other features

Pull Request resolved: facebookresearch#577

Differential Revision: D14481233

Pulled By: myleott

fbshipit-source-id: 4ff8625ef1c0b24273fc65df7c5658e3c932e8b7

v0.6.1

Toggle v0.6.1's commit message
Add fairseq to PyPI (facebookresearch#495)

Summary:
- fairseq can now be installed via pip: `pip install fairseq`
- command-line tools are globally accessible: `fairseq-preprocess`, `fairseq-train`, `fairseq-generate`, etc.
Pull Request resolved: facebookresearch#495

Differential Revision: D14017761

Pulled By: myleott

fbshipit-source-id: 10c9f6634a3056074eac2f33324b4f1f404d4235