Skip to content

Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker Diarization (Dockerfile, CI image build and test)

License

Notifications You must be signed in to change notification settings

jim60105/docker-whisperX

Repository files navigation

docker-whisperX

This is the Docker image for WhisperX: Automatic Speech Recognition with Word-Level Timestamps (and Speaker Diarization)

Get the Dockerfile at GitHub, or pull the image from ghcr.io.

Available Image Tags

Warning

Due to the excessively large file sizes (40GB+), continuous integration cannot be set up for these images. As a result, they will not update automatically.
Please build them manually if they are outdated.

The image tags are formatted as WHISPER_MODEL-LANG, for example, tiny-en, base-de, or large-v2-zh.
Please note that I does not uploaded all the combinations.

You can find all available tags at ghcr.io.

In addition, there is also a no_model tag that does not include any pre-downloaded models, also referred to as latest.

Building the Docker Image

Important

Clone the Git repository recursively to include submodules:
git clone --recursive https://github.com/jim60105/docker-whisperX.git

Build Arguments

The Dockerfile builds the image contained models. It accepts two build arguments: LANG and WHISPER_MODEL.

  • LANG: The language to transcribe. The default is en. See here for supported languages.

  • WHISPER_MODEL: The model name. The default is base. See fast-whisper for supported models.

Build Command

For example, if you want to build the image with ja language and large-v2 model:

docker build --build-arg LANG=ja --build-arg WHISPER_MODEL=large-v2 -t whisperx:large-v2-ja .

Usage Command

Mount the current directory as /app and run WhisperX with additional input arguments:

docker run --gpus all -it -v ".:/app" whisperx:large-v2-ja -- --output_format srt audio.mp3

Note

Remember to prepend -- before the arguments.
--model and --language args are defined in Dockerfile, no need to specify.

LICENSE

The main program, WhisperX, is distributed under the BSD-4 license.
Please refer to the git submodules for their respective source code licenses.

The Dockerfile from this repository is licensed under MIT.

About

Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker Diarization (Dockerfile, CI image build and test)

Topics

Resources

License

Stars

Watchers

Forks

Packages