Oráculo is a versatile CLI and WebApp application developed for transcription of audios and semantic search. It leverages Sentence Transformers and embeddings to create a compact search engine that aids in retrieving and organizing important information from a collection of documents.
This application is particularly useful for professionals dealing with substantial amounts of audio data and requiring an efficient system to transcribe and conduct semantic search operations on the data.
- Audio Transcription: Oráculo can transcribe audio files. You can transcribe a single file or bulk transcribe a folder.
- Semantic Search: A web app to perform semantic searches on the transcribed audio data.
- Python 3.10
- FFmpeg
- Git
You can install Oráculo with pip:
pip install oraculo
Initialize the Oráculo application with the following command:
oraculo init
You will be prompted to enter the following information:
Information | Description |
---|---|
ChromaDB Persist Directory | The directory where the ChromaDB will be stored. This is important to store vector embeddings of text |
ChromaDB Implementation | Defaults to duckdb+parquet . For more implementations, please refer to Source Code |
Whenever you want to change the config file, just run the same command again.
To start the Semantic Search Application, use the following command:
oraculo webapp
To initiate a transcription for a single file:
oraculo transcribe
To initiate bulk transcription for a folder:
oraculo bulk-transcribe
to transcribe youtube videos:
oraculo transcribe-yt
If you need help with the commands, use the following command:
oraculo --help
- Version: 0.1.14
- Author: Joao Tedeschi
- Contact: [email protected]
The development of Oráculo is aimed at information retrieval capabilities for businesses and individual users. Please feel free to reach out with any feedback or suggestions to improve Oráculo further.