The system can output a given Japanese presentation video/audio as audio translated into English. Corresponding scripts are also output. This can be run on Google Colaboratory.
Run main.ipynb written in Jupyter Notebook format. When you run main(), you will be asked for the path to the file, so enter that (can be local or remote).
Read the given file, this program will execute
- Japanese transcription (SpeechFlow)
- Japanese modification (ChatGPT)
- translate Japanese into English (Deepl)
- English speech synthesis
The audio and scripts are output to the output folder (If the output folder does not exist, it is automatically generated in the root directory).
The Japanese script is output in its entirety, while the English audio is separated approximately every 20 minutes. The English script is divided and output along with the audio.
Before use this program, you need to import the libraries described in requirements.txt.
This one is basically intended to run on Google Colaboratory. If you want to run it in your local environment, please change the following variables to suit your environment.
- env_path
- context_path
- prompt_fix_jp_path (optional)
- output_dir
Also, create .env
file in root directory and write variables (API keys) as follows:
- SPEECHFLOW_API_KEY_ID="hogehoge"
- SPEECHFLOW_API_KEY_SECRET=hogehoge"
- OPENAI_API_KEY="hogehoge"
- GENNY_API_URL="hogehoge"
- GENNY_API_KEY="hogehoge""
- GENNY_SPEAKER="hogehoge""
- GENNY_SPEAKER_STYLE="hogehoge""
DEEPL_AUTH_KEY="hogehoge"
This system uses the API of the following services:
- SpeechFlow - A real-time speech-to-text API service with live audio streaming capabilities[1]
- OpenAI - Provides various AI models including GPT for text generation and analysis[2]
- DeepL - High-quality language translation API with support for multiple languages[3]
- Genny - A code-generation tool for Go that enables generic programming[4]
Each service provides specific functionalities:
- SpeechFlow handles real-time audio transcription and processing[1]
- OpenAI offers advanced language models and AI capabilities[2]
- DeepL provides professional-grade translation services[3]
- Genny facilitates code generation and generic programming[4]