Skip to content

Commit

Permalink
Fixes for Dockerfile.{train,build} and adjust instructions for new image
Browse files Browse the repository at this point in the history
  • Loading branch information
reuben committed Mar 29, 2021
1 parent 1029d06 commit 214a150
Show file tree
Hide file tree
Showing 6 changed files with 68 additions and 91 deletions.
2 changes: 0 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,6 @@
/doc/.build/
/doc/xml-c/
/doc/xml-java/
Dockerfile.build
Dockerfile.train
doc/xml-c
doc/xml-java
doc/xml-dotnet
Expand Down
4 changes: 2 additions & 2 deletions Dockerfile.build.tmpl → Dockerfile.build
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
# Need devel version cause we need /usr/include/cudnn.h
FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04

ENV STT_REPO=#STT_REPO#
ENV STT_SHA=#STT_SHA#
ARG STT_REPO=https://github.com/coqui-ai/STT.git
ARG STT_SHA=origin/main

# >> START Install base software

Expand Down
63 changes: 63 additions & 0 deletions Dockerfile.train
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Please refer to the TRAINING documentation, "Basic Dockerfile for training"

FROM tensorflow/tensorflow:1.15.4-gpu-py3
ENV DEBIAN_FRONTEND=noninteractive

# We need to purge python3-xdg because it's breaking STT install later with
# weird errors about setuptools
#
# libopus0 and libsndfile1 are dependencies for audio augmentation
#
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
cmake \
curl \
git \
libboost-all-dev \
libbz2-dev \
libopus0 \
libsndfile1 \
unzip \
wget && \
apt-get purge -y python3-xdg && \
rm -rf /var/lib/apt/lists/

# Make sure pip and its deps are up-to-date
RUN pip3 install --upgrade pip wheel setuptools

WORKDIR /code

# Tool to convert output graph for inference
RUN wget https://github.com/coqui-ai/STT/releases/download/v0.9.3/convert_graphdef_memmapped_format.linux.amd64.zip -O temp.zip && \
unzip temp.zip && rm temp.zip

COPY native_client /code/native_client
COPY .git /code/.git
COPY training/coqui_stt_training/VERSION /code/training/coqui_stt_training/VERSION
COPY training/coqui_stt_training/GRAPH_VERSION /code/training/coqui_stt_training/GRAPH_VERSION

# Build CTC decoder first, to avoid clashes on incompatible versions upgrades
RUN cd native_client/ctcdecode && make NUM_PROCESSES=$(nproc) bindings
RUN pip3 install --upgrade native_client/ctcdecode/dist/*.whl

# Install STT
# - No need for the decoder since we did it earlier
# - There is already correct TensorFlow GPU installed on the base image,
# we don't want to break that
COPY setup.py /code/setup.py
COPY VERSION /code/VERSION
COPY training /code/training
RUN DS_NODECODER=y DS_NOTENSORFLOW=y pip3 install --upgrade -e .

# Build KenLM to generate new scorers
COPY kenlm /code/kenlm
RUN cd /code/kenlm && \
mkdir -p build && \
cd build && \
cmake .. && \
make -j $(nproc)

# Copy rest of the code and test training
COPY . /code
RUN ./bin/run-ldc93s1.sh && rm -rf ~/.local/share/stt
68 changes: 0 additions & 68 deletions Dockerfile.train.tmpl

This file was deleted.

9 changes: 2 additions & 7 deletions doc/DEPLOYMENT.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,17 +181,12 @@ Dockerfile for building from source
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We provide ``Dockerfile.build`` to automatically build ``libstt.so``, the C++ native client, Python bindings, and KenLM.
You need to generate the Dockerfile from the template using:

.. code-block:: bash
make Dockerfile.build
If you want to specify a different repository or branch, you can pass ``STT_REPO`` or ``STT_SHA`` parameters:
If you want to specify a different repository or branch, you can specify the ``STT_REPO`` or ``STT_SHA`` arguments:

.. code-block:: bash
make Dockerfile.build STT_REPO=git://your/fork STT_SHA=origin/your-branch
docker build . -f Dockerfile.build --build-arg STT_REPO=git://your/fork --build-arg STT_SHA=origin/your-branch
.. _runtime-deps:

Expand Down
13 changes: 1 addition & 12 deletions doc/TRAINING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,18 +88,7 @@ Setting the ``TF_FORCE_GPU_ALLOW_GROWTH`` environment variable to ``true`` seems
Basic Dockerfile for training
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We provide ``Dockerfile.train`` to automatically set up a basic training environment in Docker. You need to generate the Dockerfile from the template using:
This should ensure that you'll re-use the upstream Python 3 TensorFlow GPU-enabled Docker image.

.. code-block:: bash
make Dockerfile.train
If you want to specify a different 🐸STT repository / branch, you can pass ``STT_REPO`` or ``STT_SHA`` parameters:

.. code-block:: bash
make Dockerfile.train STT_REPO=git://your/fork STT_SHA=origin/your-branch
We provide ``Dockerfile.train`` to automatically set up a basic training environment in Docker. This should ensure that you'll re-use the upstream Python 3 TensorFlow GPU-enabled Docker image. The image can be used with ``FROM ghcr.io/coqui-ai/stt-train``.

Common Voice training data
^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down

0 comments on commit 214a150

Please sign in to comment.