This repository contains Python scripts to perform audio segmentation and transcription using the WhisperX ASR (Automatic Speech Recognition) model. WhisperX is a pre-trained ASR model that can be used to transcribe audio files into text.
The purpose of this repository is to facilitate the process of segmenting long audio files into smaller chunks and transcribing those chunks into text. This can be useful for various applications such as creating training data for ASR models, generating subtitles for videos, or extracting specific spoken content from audio recordings.
Before using the scripts in this repository, ensure that you have the following installed:
- Python 3.x
yt-dlp
for downloading YouTube audio files. You can downloadyt-dlp
from https://github.com/yt-dlp/yt-dlp/releases.
-
Clone the repository to your local machine:
git clone https://github.com/your-username/audio-segmentation-transcription.git cd audio-segmentation-transcription
-
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows, use venv\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
-
Place your audio files (in WAV format) inside the
full_audio_files
directory. -
Create a list of YouTube links in
yt_links.txt
if you want to download audio from YouTube. -
Run the
download_yt_files.py
script to download and preprocess audio files from YouTube:python download_yt_files.py
-
Run the
main.py
script to segment and transcribe the audio files:python main.py
The segmented audio files and corresponding transcriptions will be saved in the
output_audio_segments
directory.
-
Segmented Audio Files: The segmented audio files (in WAV format) will be saved in the
output_audio_segments/{run_name}/audio
directory. -
Transcriptions: The transcriptions for each segment will be saved in
output_audio_segments/{run_name}/train.txt
(for training data) andoutput_audio_segments/{run_name}/validation.txt
(for validation data).
-
Ensure that you have enough storage space, especially if dealing with large audio files, as the segmented audio files can consume significant disk space.
-
You can adjust the parameters in the scripts (such as
batch_size
,compute_type
, andlanguage
) to customize the behavior of the WhisperX model according to your requirements. -
For more information about the WhisperX ASR model, refer to the official documentation or repository of the model.
-
This readme was generated with GPT model, any fixes are more than welcome 😃