Summary

The system can output a given Japanese presentation video/audio as audio translated into English. Corresponding scripts are also output. This can be run on Google Colaboratory.

Usage

Run main.ipynb written in Jupyter Notebook format. When you run main(), you will be asked for the path to the file, so enter that (can be local or remote).

Read the given file, this program will execute

Japanese transcription (SpeechFlow)
Japanese modification (ChatGPT)
translate Japanese into English (Deepl)
English speech synthesis

The audio and scripts are output to the output folder (If the output folder does not exist, it is automatically generated in the root directory).

The Japanese script is output in its entirety, while the English audio is separated approximately every 20 minutes. The English script is divided and output along with the audio.

Preparation

Common to all

Before use this program, you need to import the libraries described in requirements.txt.

This one is basically intended to run on Google Colaboratory. If you want to run it in your local environment, please change the following variables to suit your environment.

env_path
context_path
prompt_fix_jp_path (optional)
output_dir

API related

Also, create .env file in root directory and write variables (API keys) as follows:

SpeechFlow

SPEECHFLOW_API_KEY_ID="hogehoge"
SPEECHFLOW_API_KEY_SECRET=hogehoge"

OpenAI ChatGPT

OPENAI_API_KEY="hogehoge"

Genny

GENNY_API_URL="hogehoge"
GENNY_API_KEY="hogehoge""
GENNY_SPEAKER="hogehoge""
GENNY_SPEAKER_STYLE="hogehoge""

Deepl

DEEPL_AUTH_KEY="hogehoge"

Note on API

This system uses the API of the following services:

SpeechFlow - A real-time speech-to-text API service with live audio streaming capabilities[1]
OpenAI - Provides various AI models including GPT for text generation and analysis[2]
DeepL - High-quality language translation API with support for multiple languages[3]
Genny - A code-generation tool for Go that enables generic programming[4]

Each service provides specific functionalities:

SpeechFlow handles real-time audio transcription and processing[1]
OpenAI offers advanced language models and AI capabilities[2]
DeepL provides professional-grade translation services[3]
Genny facilitates code generation and generic programming[4]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
.gitignore		.gitignore
README.md		README.md
main.ipynb		main.ipynb
requirements.txt		requirements.txt
簡単な使い方.md		簡単な使い方.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary

Usage

Preparation

Common to all

API related

SpeechFlow

OpenAI ChatGPT

Genny

Deepl

Note on API

About

Releases

Packages

Languages

ll-0013py/VoiceTranslateFlow

Folders and files

Latest commit

History

Repository files navigation

Summary

Usage

Preparation

Common to all

API related

SpeechFlow

OpenAI ChatGPT

Genny

Deepl

Note on API

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages