A speech transcription and translation application using whisper AI model.
- Features
- User Requirements
- Download & Installation
- General Usage
- User Settings
- Development
- Contributing
- License
- Attribution
- Other
- Speech to text
- Translation of transcribed text (Speech to translated text)
- Realtime input from mic and speaker
- Batch file processing with timestamp
- FFmpeg is required to be installed and added to the PATH environment variable. You can download it here and add it to your path manually OR you can do it automatically using the following commands:
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
- Whisper uses vram/gpu to process the audio, so it is recommended to have a CUDA compatible GPU. If there is no compatible GPU, the application will use the CPU to process the audio (This might make it slower). For each model requirement you can check directly at the whisper repository or you can hover over the model selection in the app (there will be a tooltip about the model info).
- Speaker input only work on windows 8 and above.
- Make sure that you have installed FFmpeg and added it to the PATH environment variable. See here for more info.
- Download the latest release here
- Install or extract the downloaded file
- Run the program
- Set user setting
- Select model
- Select mode and language
- Click the record button
- Stop record
- (Optionally) export the result to a file
- You can change the settings by clicking the settings button on the menubar of the app. Alternatively, you can press F2 to open the menu window when in focus or you could also edit the settings file manually located at
$project_dir/user/setting.json
. - If the terminal/console is still showing, you will need to set your
default terminal application
towindows console host
in yourwindows terminal
setting.
Warning
As of right now (4th of November 2022) I guess pytorch is not compatible with python 3.11 so you can't use python 3.11. I tried with 3.11 but it doesn't work so i rollback to python 3.10.9.
Note
Ignore all this if you are using the release/compiled version.
Note
It is recommended to create a virtual environment, but it is not required.
- Create your virtual environment by running
python -m venv venv
- Activate your virtual environment by running
source venv/bin/activate
- Install all the dependencies needed by running the
devSetup.py
located in root directory or install the packages yourself by runningpip install -r requirements.txt
- Make sure to have ffmpeg installed and added to your PATH
- Get to root directory and Run the script by typing
python Main.py
Note
This process could be handled automatically by running devSetup.py
To use GPU you first need to uninstall torch
then you can go to pytorch official website to install the correct version of pytorch
with GPU compatibily for your system.
Before compiling the project, make sure you have installed all the dependencies and setup your pytorch correctly. Your pytorch version will control wether the app will use GPU or CPU (that's why it's recommended to make virtual environment for the project).
I have provided a [build script](./build.py)
that will build the project for you. You can run it by typing python build.py
in the root directory. This will produce an executable file in the dist
directory. An active python virtual environment is required to run the script. Alternatively you can use the following commands to build the project:
pyinstaller --noconfirm --onedir --console --icon "./assets/icon.ico" --name "Speech Translate" --clean --add-data "./assets;assets/" --copy-metadata "tqdm" --copy-metadata "regex" --copy-metadata "requests" --copy-metadata "packaging" --copy-metadata "filelock" --copy-metadata "numpy" --copy-metadata "tokenizers" --add-data "./venv/Lib/site-packages/whisper/assets;whisper/assets/" "./Main.py"
Note: Replace the venv with your actual venv / python path
This project should be compatible with Windows (preferrably windows 10 or later) and other platforms. But I haven't tested it on platform other than windows.
Feel free to contribute to this project by forking the repository, making your changes, and submitting a pull request. You can also contribute by creating an issue if you find a bug or have a feature request. Also, feel free to give this project a star if you like it.
This project is licensed under the MIT License - see the LICENSE file for details
- Sunvalley TTK Theme (used for app theme although i modified it a bit)
Check out my other similar project called Screen Translate a screen translator / OCR tools made possible using tesseract.