TalkTales is a specialized tool designed to assist deaf and hard-of-hearing individuals in their daily interactions. Unlike conventional speech-to-text services, TalkTales goes a step further by providing contextual understanding through speaker differentiation.
The primary objective of this project is to develop a specialized tool aimed at assisting deaf individuals in their day-to-day interactions. While the overarching goal is to convert spoken language into text, the unique aspect of this project lies in its approach to speaker differentiation. By highlighting changes in the speaker's voice within the transcribed text, our tool aims to offer an enhanced contextual understanding, a feature often missing in traditional speech-to-text services.
We leverage the open-source Vosk model for the core speech-to-text translation. However, our methodology diverges from mainstream solutions, as we are intent on reducing our dependence on machine learning algorithms. The goal is not merely to create a functional tool but to deepen our understanding of sound and voice phenomena. Most of the concepts are explained in detail inside the docs directory.
The development of the repository is currently halted due to high intensity of university tasks
- Python 3.10+ installed
git
installed
First, clone the repository to your local machine:
git clone https://github.com/kryczkal/TalkTales.git ; cd TalkTales
Before installation of the dependencies, it is highly recommended to set up a virtual environment inside the project
python -m venv `myPythonEnv`
source `myPythonEnv`/bin/activate
Then install the python dependencies with pip
pip install -r requirements.txt
You are ready to go
Run the application with
python main.py
or on linux
./main.py
Various utilities are also provided alongside the main app. These include:
Program used to invoke Diarizers components without application frontend. It either loads an audio file or connects to live stream. It should then write speaker changes to stdout. If set to plot (With Settings file), it will also make plot of speakers across time.
python DiarizationTester.py [optional: filename]
or
./DiarizationTester.py [optional: filename]
The "Suggestions" folder serves as an experimental solution to address the challenges posed by artifacts in speech-to-text algorithms. These algorithms, while impressive, are not flawless. Even minor errors can significantly impede the smoothness of a conversation. To mitigate this, we've implemented a secondary layer of security that scans the transcribed sentences for anomalies. When it identifies an 'unlikely' word or phrase, it flags it as such and offers a more probable alternative. This enhancement leverages the Herbert language model. Unfortunately this approach has a limitation: it's too slow for real-time applications.
It can be tested with:
python src/suggestions/testing.py
or
./src/suggestions/testing.py
The script will ask for input sentences, and struggle to find improbable utterances in it, and suggest improvements.
The "Matlab" folder contains a suite of streamlined scripts designed specifically for the acquisition of sound samples across diverse settings. These scripts are integral to our broader project, which aims to analyze human speech in various environments and build a speech diarization tool. To this end, we generated a comprehensive range of auditory visualizations, including wave plots, spectrograms, and mel spectrograms. During the initial phases of our research and development, these visualizations functioned as foundational references, enhancing our understanding of the acoustics and patterns of human speech.
- Simple diarization model
- Multithreaded backend design
- Diarization model tuning and upgrades
- Multi-language support
- More detailed examples
- Android application front-end
- Lukasz Kryczka
- Jakub Lisowski
- Tomasz Mycielski
- Michal Kwiatkowski
- Ernest Molczan
- Wojtek Matejuk
- Mateusz Mikiciuk
- Sofiia Kuzmenko
Distributed under the MIT License. See LICENSE.txt
for more information.