Tags: ismstat/sockeye
Tags
Avoid circular import: move cleanup method to training.py (awslabs#932)
fix warmup of learning rate so that it works properly w/ continued tr… ( awslabs#926) Co-authored-by: Steven Bradtke sjbradt <[email protected]>
Interleaved Multi-head Attention Operators (awslabs#884) Replaced batched dot product in multi-head attention with interleaved_matmul attention operators to improve performance. Also changes the batch-major data to time-major format while in the model to comply with the new operator requirements.
Add SSRU layer and decoder (awslabs#851) - Adds layers.SSRU, which implements a Simpler Simple Recurrent Unit as described by Kim et al, 2019. - Adds ssru_transformer option to --decoder, which enables the usage of SSRUs as a replacement for the decoder-side self-attention layers. - Reduces the number of arguments for MultiHeadSelfAttention.hybrid_forward(). previous_keys and previous_values should now be input together as previous_states, a list containing two symbols.
Option to suppress console output for secondary workers (awslabs#841)
PreviousNext