Transcribe Large Audiofile with Whisper

Setup

Require Python 3.10 and FFMPEG installed

Recommended (why to use venv) Create venv and enter in venv

python -m venv venv

# Linux
. venv/bin/activate
# Windows
. venv/Script/activate

Install requirements

# Make sure the (venv) is flag in the terminal
pip install -r requirements.txt

Fix ffmpeg install if failures : Uninstall and reinstall ffmpeg-python

pip uninstall ffmpeg
pip uninstall ffmpeg-python

pip install ffmpeg-python

Run

With the venv activated.

Usage:

python app.py stuff.mp3

Result will be stored in stuff_transcribe_result.txt

Check the demo

Run with model param

Model size for whisper are those : "tiny", "small", "base", "medium", "large"
The argument for the model is in 2nd
By default the large model will be used

Run on tiny model:

python app.py stuff.mp3 tiny

Run on base model:

python app.py stuff.py base

Logic

Moviepy will chunk the mp3 audio file into smaller chunk audio
The audio chunks are then stored in ./tmp_chunks_audio_speach2text/
For each chunk audio (in the right order), execute the whisper model (60 seconds maximum)
Append all text result
Write to file result stuff_transcribe_result.txt
Print result to stdout

Once the process is done, you can clean tmp_chunks_audio_speach2text/ folder

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
DEMO.md		DEMO.md
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
requirements_nvidia.txt		requirements_nvidia.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcribe Large Audiofile with Whisper

Setup

Run

Run with model param

Logic

About

Releases

Packages

Languages

SidoShiro/Speech2TextCLI

Folders and files

Latest commit

History

Repository files navigation

Transcribe Large Audiofile with Whisper

Setup

Run

Run with model param

Logic

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages