Name		Name	Last commit message	Last commit date
parent directory ..
StreamingASR		StreamingASR
README.md		README.md
generate_ts.py		generate_ts.py
global_stats.json		global_stats.json
run_sasr.py		run_sasr.py
save_model_for_mobile.py		save_model_for_mobile.py
screenshot1.png		screenshot1.png
screenshot2.png		screenshot2.png
screenshot3.png		screenshot3.png
spm_bpe_4096_fairseq.dict		spm_bpe_4096_fairseq.dict

README.md

Streaming Speech Recognition on Android with Emformer-RNNT-based Model

Introduction

In the Speech Recognition Android demo app, we showed how to use the wav2vec 2.0 model on an Android demo app to perform non-continuous speech recognition. Here we're going one step further, using a torchaudio Emformer-RNNT-based ASR model in Android to perform streaming speech recognition.

Prerequisites

PyTorch 1.12 and torchaudio 0.12 or above (Optional)
Python 3.8 (Optional)
Android Pytorch library org.pytorch:pytorch_android_lite:1.12.2
Android Studio 4.0.1 or later

Quick Start

1. Get the Repo

Simply run the commands below:

git clone https://github.com/pytorch/android-demo-app
cd android-demo-app/StreamingASR

If you don't have PyTorch 1.12 and torchaudio 0.12 installed or want to have a quick try of the demo app, you can download the optimized scripted model file streaming_asrv2.ptl, then drag and drop it to the StreamingASR/app/src/main/assets folder inside android-demo-app/StreamingASR, and continue to Step 3.

2. Test and Prepare the Model

To install PyTorch 1.12, torchaudio 0.12, and other required packages (numpy, pyaudio, and fairseq), do something like this:

conda create -n pt1.12 python=3.8.5
conda activate pt1.12
pip install torch torchaudio numpy pyaudio fairseq

First, create the model file scripted_wrapper_tuple.pt by running python generate_ts.py.

Then, to test the model, run python run_sasr.py. After you see:

Initializing model...
Initialization complete.

say something like "good afternoon happy new year", and you'll likely see the streaming recognition results good afternoon happy new year while you speak. Hit Ctrl-C to end.

Finally, to optimize and convert the model to the format that can run on Android, run the following commands:

mkdir -p StreamingASR/app/src/main/assets
python save_model_for_mobile.py
mv streaming_asrv2.ptl StreamingASR/app/src/main/assets

3. Build and run with Android Studio

Start Android Studio, open the project located in android-demo-app/StreamingASR/StreamingASR, build and run the app on an Android device (not an emulator). After the app runs, tap the Start button and start saying something. Some example recognition results are:

Librosa C++, Eigen, and JNI

The first version of this demo uses a C++ port of Librosa, a popular audio processing library in Python, to perform the MelSpectrogram transform, because torchaudio before version 0.11 doesn't support fft on Android (see here). Using the Librosa C++ port and JNI (Java Native Interface) on Android makes the MelSpectrogram possible on Android. Furthermore, the Librosa C++ port requires Eigen, a C++ template library for linear algebra, so both the port and the Eigen library are included in the first version of the demo app and built as JNI.

See here for the first version of the demo if interested in an example of using native C++ to expand operations not yet supported in PyTorch or one of its domain libraries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StreamingASR

StreamingASR

README.md

Streaming Speech Recognition on Android with Emformer-RNNT-based Model

Introduction

Prerequisites

Quick Start

1. Get the Repo

2. Test and Prepare the Model

3. Build and run with Android Studio

Librosa C++, Eigen, and JNI

Files

StreamingASR

Directory actions

More options

Directory actions

More options

Latest commit

History

StreamingASR

Folders and files

parent directory

README.md

Streaming Speech Recognition on Android with Emformer-RNNT-based Model

Introduction

Prerequisites

Quick Start

1. Get the Repo

2. Test and Prepare the Model

3. Build and run with Android Studio

Librosa C++, Eigen, and JNI