List of hot directions in industrial speech recognition, i.e., Streaming ASR / Non-autoregressive ASR / WFST based ASR ...
If you are interested in this repo, any pull request is welcomed.
- Standard RNA: Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping (Interspeech 2017)
- Extended RNA: Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin (Interspeech 2018)
- Transformer equipped RNA: Self-attention Aligner: A Latency-control End-to-end Model for ASR Using Self-attention Network and Chunk-hopping (ICASSP 2019)
- CIF: CIF: Continuous Integrate-And-Fire for End-To-End Speech Recognition (ICASSP 2020)
- CIF: A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition (Interspeech 2020)
- Standard RNN-T: Streaming E2E Speech Recognition For Mobile Devices (ICASSP 2019)
- Latency Controlled RNN-T: RNN-T For Latency Controlled ASR With Improved Beam Search (arXiv 2019)
- Transformer equipped RNN-T: Self-Attention Transducers for End-to-End Speech Recognition (Interspeech 2019)
- Transformer equipped RNN-T: Transformer Transducer: A Streamable Speech Recognition Model With Transformer Encoders And RNN-T Loss (ICASSP 2020)
- Transformer equipped RNN-T: A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency (ICASSP 2020)
- Tricks for RNN-T Training: Towards Fast And Accurate Streaming E2E ASR (ICASSP 2020)
- Knowledge Distillation for RNN-T: Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-end Speech Recognition (Interspeech 2020)
- Transfer Learning for RNN-T: Transfer Learning Approaches for Streaming End-to-End Speech Recognition System (Interspeech 2020)
- Exploration on RNN-T: Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer (Interspeech 2020)
- Montonic Attention: Montonic Chunkwise Attention (ICLR 2018)
- Enhanced Montonic Attention: Enhancing Monotonic Multihead Attention for Streaming ASR (Interspeech 2020)
- Minimum Latency Training based on Montomic Attention: Minimum Latency Training Strategies For Streaming seq-to-seq ASR (ICASSP 2020)
- Triggered Attention: Triggered Attention for End-to-End Speech Recognition (ICASSP 2019)
- Triggered Attention for Transformer: Streaming Automatic Speech Recognition With The Transformer Model (ICASSP 2020)
- Block-synchronous: Streaming Transformer ASR with Blockwise Synchronous Inference (ASRU 2019)
- Block-synchronous with chunk reuse: Transformer Online CTC/Attention E2E Speech Recognition Architecture (ICASSP 2020)
- Block-synchronous with RNN-T like decoding rule: Synchronous Transformers For E2E Speech Recognition (ICASSP 2020)
- Scout-synchronous: Low Latency End-to-End Streaming Speech Recognition with a Scout Network (Interspeech 2020)
- CTC-synchronous: CTC-synchronous Training for Monotonic Attention Model (Interspeech 2020)
- Memory Augmented Attention: Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory (Interspeech 2020)
- Memory Augmented Attention: Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition (Interspeech 2020)
- Optimized Beam Search: High Performance Sequence-to-Sequence Model for Streaming Speech Recognition (Interspeech 2020)
- Transformer Transducer: One Model Unifying Streaming And Non-Streaming Speech Recognition (arXiv 2020)
- Universal ASR: Unify And Improve Streaming ASR With Full-Context Modeling (ICLR 2021 under double-blind review)
- MASK-Predict: Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition (arXiv 2019)
- Imputer: Imputer: Sequence modelling via imputation and dynamic programming (arXiv 2020)
- Insertion-based: Insertion-Based Modeling for End-to-End Automatic Speech Recognition (arXiv 2020)
- MASK-CTC: Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict (Interspeech 2020)
- Spike Triggered: Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition (Interspeech 2020)
- Similar to MASK-Predict: Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition (Interspeech 2020)
- Review: Automatic Speech Recognition Errors Detection and Correction: A Review (N/A)
- LAS based: A Spelling Correction Model For E2E Speech Recognition (ICASSP 2019)
- Transformer based: An Empirical Study Of Efficient ASR Rescoring With Transformers (arXiv 2019)
- Transformer based: Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition (Interspeech 2019)
- Transformer based: Correction of Automatic Speech Recognition with Transformer Sequence-To-Sequence Model (ICASSP 2020)
- BERT based: Effective Sentence Scoring Method Using BERT for Speech Recognition (ACML 2019)
- BERT based: Spelling Error Correction with Soft-Masked BERT (ACL 2020)
- Parallel Rescoring: Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition (Interspeech 2020)
- Lightweight Low-Rank transformer: Lightweight and Efficient End-to-End Speech Recognition Using Low-Rank Transformer (ICASSP 2020)
- Attention replacement: How Much Self-Attention Do We Need ƒ Trading Attention for Feed-Forward Layers (ICASSP 2020)