Project: Oráculo

Oráculo is a versatile CLI and WebApp application developed for transcription of audios and semantic search. It leverages Sentence Transformers and embeddings to create a compact search engine that aids in retrieving and organizing important information from a collection of documents.

This application is particularly useful for professionals dealing with substantial amounts of audio data and requiring an efficient system to transcribe and conduct semantic search operations on the data.

Features:

Audio Transcription: Oráculo can transcribe audio files. You can transcribe a single file or bulk transcribe a folder.
Semantic Search: A web app to perform semantic searches on the transcribed audio data.

Requirements:

⚠️ IMPORTANT ⚠️ In order to run Oráculo, you need to have the following requirements installed on your machine:

Python 3.10
FFmpeg
Git

Installation:

You can install Oráculo with pip:

pip install oraculo

Setup:

⚠️⚠️Warning⚠️⚠️: The following steps are required to run Oráculo. Please follow the steps carefully.

Initialize the Oráculo application with the following command:

oraculo init

You will be prompted to enter the following information:

Information	Description
ChromaDB Persist Directory	The directory where the ChromaDB will be stored. This is important to store vector embeddings of text
ChromaDB Implementation	Defaults to `duckdb+parquet`. For more implementations, please refer to Source Code

Whenever you want to change the config file, just run the same command again.

Usage:

Semantic Search:

To start the Semantic Search Application, use the following command:

oraculo webapp

Single File Transcription:

To initiate a transcription for a single file:

oraculo transcribe

Multiple File Transcription:

To initiate bulk transcription for a folder:

oraculo bulk-transcribe

to transcribe youtube videos:

YouTube Video Transcription:

oraculo transcribe-yt

Help:

If you need help with the commands, use the following command:

oraculo --help

About

Version: 0.1.14
Author: Joao Tedeschi
Contact: [email protected]

The development of Oráculo is aimed at information retrieval capabilities for businesses and individual users. Please feel free to reach out with any feedback or suggestions to improve Oráculo further.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.chromadb		.chromadb
.github/workflows		.github/workflows
.vscode		.vscode
dist		dist
oraculo		oraculo
tests		tests
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: Oráculo

Features:

Requirements:

Installation:

Setup:

Usage:

Semantic Search:

Single File Transcription:

Multiple File Transcription:

YouTube Video Transcription:

Help:

About

About

Releases

Packages

Languages

jrtedeschi/oraculo

Folders and files

Latest commit

History

Repository files navigation

Project: Oráculo

Features:

Requirements:

Installation:

Setup:

Usage:

Semantic Search:

Single File Transcription:

Multiple File Transcription:

YouTube Video Transcription:

Help:

About

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages