Skip to content

Latest commit

 

History

History
9 lines (6 loc) · 760 Bytes

online-alignments-pg.md

File metadata and controls

9 lines (6 loc) · 760 Bytes

TLDR; The authors use policy gradients on an RNN to train a "hard" attention mechanism that decides whether to output something at the current timestep or not. Their algorithm is online, which means it does not need to see the complete sequence before making a prediction, as is the case with soft attention. The authors evaluate their model on small- and medium-scale speech recognition tasks, where they achieve performance comparable to standard sequential models.

Notes:

  • Entropy regularization and baselines were critical to make the model learn
  • Neat trick: Increase dropout as training progresses
  • Grid LSTMs outperformed standard LSTMs