Release v0.8.0 -> v0.9.0 (#1452) · ananthsub/fairseq

v0.9.0
df2f84c
Compare

Choose a tag to compare

Loading

View all tags

v0.9.0
df2f84c
Compare

Choose a tag to compare

Loading

View all tags

myleott tagged this 03 Dec 23:19

Summary:
Possibly breaking changes:
- Set global numpy seed (4a7cd58)
- Split `in_proj_weight` into separate k, v, q projections in MultiheadAttention (fdf4c3e)
- TransformerEncoder returns namedtuples instead of dict (27568a7)

New features:
- Add `--fast-stat-sync` option (e1ba32a)
- Add `--empty-cache-freq` option (315c463)
- Support criterions with parameters (ba5f829)

New papers:
- Simple and Effective Noisy Channel Modeling for Neural Machine Translation (49177c9)
- Levenshtein Transformer (86857a5, ...)
- Cross+Self-Attention for Transformer Models (4ac2c5f)
- Jointly Learning to Align and Translate with Transformer Models (1c66792)
- Reducing Transformer Depth on Demand with Structured Dropout (dabbef4)
- Unsupervised Cross-lingual Representation Learning at Scale (XLM-RoBERTa) (e23e5ea)
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (a92bcda)
- CamemBERT: a French BERT (b31849a)

Speed improvements:
- Add CUDA kernels for LightConv and DynamicConv (f840564)
- Cythonization of various dataloading components (4fc3953, ...)
- Don't project mask tokens for MLM training (718677e)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1452

Differential Revision: D18798409

Pulled By: myleott

fbshipit-source-id: 860a0d5aaf7377c8c9bd63cdb3b33d464f0e1727

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly