- Require Python 3.10 and FFMPEG installed
- Recommended (why to use venv) Create venv and enter in venv
python -m venv venv # Linux . venv/bin/activate # Windows . venv/Script/activate
- Install requirements
# Make sure the (venv) is flag in the terminal pip install -r requirements.txt
- Fix ffmpeg install if failures : Uninstall and reinstall ffmpeg-python
pip uninstall ffmpeg pip uninstall ffmpeg-python pip install ffmpeg-python
With the venv activated.
Usage:
python app.py stuff.mp3
Result will be stored in stuff_transcribe_result.txt
- Model size for whisper are those : "tiny", "small", "base", "medium", "large"
- The argument for the model is in 2nd
- By default the large model will be used
Run on tiny model:
python app.py stuff.mp3 tiny
Run on base model:
python app.py stuff.py base
- Moviepy will chunk the mp3 audio file into smaller chunk audio
- The audio chunks are then stored in
./tmp_chunks_audio_speach2text/
- For each chunk audio (in the right order), execute the whisper model (60 seconds maximum)
- Append all text result
- Write to file result
stuff_transcribe_result.txt
- Print result to stdout
Once the process is done, you can clean
tmp_chunks_audio_speach2text/
folder