Audio Segmentation and Transcription with WhisperX

This repository contains Python scripts to perform audio segmentation and transcription using the WhisperX ASR (Automatic Speech Recognition) model. WhisperX is a pre-trained ASR model that can be used to transcribe audio files into text.

Purpose

The purpose of this repository is to facilitate the process of segmenting long audio files into smaller chunks and transcribing those chunks into text. This can be useful for various applications such as creating training data for ASR models, generating subtitles for videos, or extracting specific spoken content from audio recordings.

Usage

1. Prerequisites

Before using the scripts in this repository, ensure that you have the following installed:

Python 3.x
yt-dlp for downloading YouTube audio files. You can download yt-dlp from https://github.com/yt-dlp/yt-dlp/releases.

2. Set Up

Clone the repository to your local machine:

git clone https://github.com/your-username/audio-segmentation-transcription.git
cd audio-segmentation-transcription

Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate   # On Windows, use venv\Scripts\activate

Install the required packages:
```
pip install -r requirements.txt
```

3. Audio Segmentation and Transcription

Segmentation and Transcription from Local Audio Files

Place your audio files (in WAV format) inside the full_audio_files directory.
Create a list of YouTube links in yt_links.txt if you want to download audio from YouTube.
Run the download_yt_files.py script to download and preprocess audio files from YouTube:
```
python download_yt_files.py
```
Run the main.py script to segment and transcribe the audio files:
```
python main.py
```
The segmented audio files and corresponding transcriptions will be saved in the output_audio_segments directory.

4. Output

Segmented Audio Files: The segmented audio files (in WAV format) will be saved in the output_audio_segments/{run_name}/audio directory.
Transcriptions: The transcriptions for each segment will be saved in output_audio_segments/{run_name}/train.txt (for training data) and output_audio_segments/{run_name}/validation.txt (for validation data).

Notes

Ensure that you have enough storage space, especially if dealing with large audio files, as the segmented audio files can consume significant disk space.
You can adjust the parameters in the scripts (such as batch_size, compute_type, and language) to customize the behavior of the WhisperX model according to your requirements.
For more information about the WhisperX ASR model, refer to the official documentation or repository of the model.
This readme was generated with GPT model, any fixes are more than welcome 😃

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
full_audio_files		full_audio_files
README.md		README.md
check_audio_length.py		check_audio_length.py
convert_audio_to_mono_22050.py		convert_audio_to_mono_22050.py
download_yt_files.py		download_yt_files.py
main.py		main.py
requirements.txt		requirements.txt
yt_links.txt		yt_links.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Segmentation and Transcription with WhisperX

Purpose

Usage

1. Prerequisites

2. Set Up

3. Audio Segmentation and Transcription

Segmentation and Transcription from Local Audio Files

4. Output

Notes

About

Releases

Packages

Languages

MisterCapi/auto_dataset_tts

Folders and files

Latest commit

History

Repository files navigation

Audio Segmentation and Transcription with WhisperX

Purpose

Usage

1. Prerequisites

2. Set Up

3. Audio Segmentation and Transcription

Segmentation and Transcription from Local Audio Files

4. Output

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages