Inference using a DeepSpeech pre-trained model can be done with a client/language binding package. We have four clients/language bindings in this repository, listed below, and also a few community-maintained clients/language bindings in other repositories, listed further down in this README.
- The Python package/language binding
- The Node.JS package/language binding
- The Command-Line client
- The .NET client/language binding
Running deepspeech
might, see below, require some runtime dependencies to be already installed on your system:
- sox - The Python and Node.JS clients use SoX to resample files to 16kHz.
- libgomp1 - libsox (statically linked into the clients) depends on OpenMP. Some people have had to install this manually.
- libstdc++ - Standard C++ Library implementation. Some people have had to install this manually.
- libpthread - On Linux, some people have had to install libpthread manually.
Please refer to your system's documentation on how to install these dependencies.
The GPU capable builds (Python, NodeJS, C++, etc) depend on the same CUDA runtime as upstream TensorFlow. Currently with TensorFlow 1.14 it depends on CUDA 10.0 and CuDNN v7.5. See the TensorFlow documentation.
If you want to use the pre-trained English model for performing speech-to-text, you can download it (along with other important inference material) from the DeepSpeech releases page. Alternatively, you can run the following command to download and unzip the model files in your current directory:
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.5.1/deepspeech-0.5.1-models.tar.gz
tar xvfz deepspeech-0.5.1-models.tar.gz
DeepSpeech models are versioned to keep you from trying to use an incompatible graph with a newer client after a breaking change was made to the code. If you get an error saying your model file version is too old for the client, you should either upgrade to a newer model release, re-export your model from the checkpoint using a newer version of the code, or downgrade your client if you need to use the old model and can't re-export it.
Pre-built binaries which can be used for performing inference with a trained model can be installed with pip3
. You can then use the deepspeech
binary to do speech-to-text on an audio file:
For the Python bindings, it is highly recommended that you perform the installation within a Python 3.5 or later virtual environment. You can find more information about those in this documentation.
We will continue under the assumption that you already have your system properly setup to create new virtual environments.
In creating a virtual environment you will create a directory containing a python3
binary and everything needed to run deepspeech. You can use whatever directory you want. For the purpose of the documentation, we will rely on $HOME/tmp/deepspeech-venv
. You can create it using this command:
$ virtualenv -p python3 $HOME/tmp/deepspeech-venv/
Once this command completes successfully, the environment will be ready to be activated.
Each time you need to work with DeepSpeech, you have to activate this virtual environment. This is done with this simple command:
$ source $HOME/tmp/deepspeech-venv/bin/activate
Once your environment has been set-up and loaded, you can use pip3
to manage packages locally. On a fresh setup of the virtualenv
, you will have to install the DeepSpeech wheel. You can check if deepspeech
is already installed with pip3 list
.
To perform the installation, just use pip3
as such:
$ pip3 install deepspeech
If deepspeech
is already installed, you can update it as such:
$ pip3 install --upgrade deepspeech
Alternatively, if you have a supported NVIDIA GPU on Linux, you can install the GPU specific package as follows:
$ pip3 install deepspeech-gpu
See the release notes to find which GPUs are supported. Please ensure you have the required CUDA dependency.
You can update deepspeech-gpu
as follows:
$ pip3 install --upgrade deepspeech-gpu
In both cases, pip3
should take care of installing all the required dependencies. After installation has finished, you should be able to call deepspeech
from the command-line.
Note: the following command assumes you downloaded the pre-trained model.
deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav
The arguments --lm
and --trie
are optional, and represent a language model.
See client.py for an example of how to use the package programatically.
You can download the Node.JS bindings using npm
:
npm install deepspeech
Please note that as of now, we only support Node.JS versions 4, 5 and 6. Once SWIG has support we can build for newer versions.
Alternatively, if you're using Linux and have a supported NVIDIA GPU, you can install the GPU specific package as follows:
npm install deepspeech-gpu
See the release notes to find which GPUs are supported. Please ensure you have the required CUDA dependency.
See client.js for an example of how to use the bindings. Or download the wav example.
To download the pre-built binaries for the deepspeech
command-line (compiled C++) client, use util/taskcluster.py
:
python3 util/taskcluster.py --target .
or if you're on macOS:
python3 util/taskcluster.py --arch osx --target .
also, if you need some binaries different than current master, like v0.2.0-alpha.6
, you can use --branch
:
python3 util/taskcluster.py --branch "v0.2.0-alpha.6" --target "."
The script taskcluster.py
will download native_client.tar.xz
(which includes the deepspeech
binary, generate_trie
and associated libraries) and extract it into the current folder. Also, taskcluster.py
will download binaries for Linux/x86_64 by default, but you can override that behavior with the --arch
parameter. See the help info with python util/taskcluster.py -h
for more details. Specific branches of DeepSpeech or TensorFlow can be specified as well.
Note: the following command assumes you downloaded the pre-trained model.
./deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio audio_input.wav
See the help output with ./deepspeech -h
and the native client README for more details.
If pre-built binaries aren't available for your system, you'll need to install them from scratch. Follow these ``native_client` installation instructions <native_client/README.md>`_.
In addition to the bindings above, third party developers have started to provide bindings to other languages:
- Asticode provides Golang bindings in its go-astideepspeech repo.
- RustAudio provide a Rust binding, the installation and use of which is described in their deepspeech-rs repo.
- stes provides preliminary PKGBUILDs to install the client and python bindings on Arch Linux in the arch-deepspeech repo.
- gst-deepspeech provides a GStreamer plugin which can be used from any language with GStreamer bindings.