Linux desktop and Sailfish OS app for note taking, reading and translating with offline Speech to Text, Text to Speech and Machine Translation
Speech Note let you take, read and translate notes in multiple languages. It uses Speech to Text, Text to Speech and Machine Translation to do so. Text and voice processing take place entirely offline, locally on your computer, without using a network connection. Your privacy is always respected. No data is sent to the Internet.
Speech Note uses many different processing engines to do its job. Currently these are used:
- Speech to Text (STT)
- Text to Speech (TTS)
- Machine Translation (MT)
Following languages are supported:
Lang ID | Name | DeepSpeech (STT) | Whisper (STT) | Vosk (STT) | April-ASR (STT) | Piper (TTS) | RHVoice (TTS) | espeak (TTS) | MBROLA (TTS) | Coqui (TTS) | Mimic3 (TTS) | Bergamot (MT) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
af | Afrikaans | ● | ● | ● | ||||||||
am | Amharic | ● (e) | ● | ● | ● | |||||||
ar | Arabic | ● | ● | ● | ● | ● | ● | ● | ||||
bg | Bulgarian | ● | ● | ● | ||||||||
bn | Bengali | ● | ● | ● | ● | |||||||
bs | Bosnian | ● | ● | |||||||||
ca | Catalan | ● | ● | ● | ● | ● | ● | ● | ||||
cs | Czech | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||
da | Danish | ● | ● | ● | ● | ● | ||||||
de | German | ● | ● | ● | ● | ● | ● | ● | ● | |||
el | Greek | ● (e) | ● | ● | ● | ● | ● | |||||
en | English | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |
eo | Esperanto | ● | ● | ● | ||||||||
es | Spanish | ● | ● | ● | ● | ● | ● | ● | ● | |||
et | Estonian | ● (e) | ● | ● | ● | ● | ● | |||||
eu | Basque | ● (e) | ● | ● | ● | |||||||
fa | Persian | ● | ● | ● | ● | ● | ● | ● | ● | |||
fi | Finnish | ● | ● | ● | ● | ● | ● | |||||
fr | French | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||
ga | Irish | ● | ● | |||||||||
gu | Gujarati | ● | ● | ● | ||||||||
ha | Hausa | ● | ● | |||||||||
he | Hebrew | ● | ● | |||||||||
hi | Hindi | ● | ● | ● | ||||||||
hr | Croatian | ● | ● | ● | ● | |||||||
hu | Hungarian | ● (e) | ● | ● | ● | ● | ● | ● | ||||
id | Indonesian | ● (e) | ● | ● | ● | ● | ||||||
is | Icelandic | ● | ● | ● | ● | ● | ||||||
it | Italian | ● | ● | ● | ● | ● | ● | ● | ● | |||
ja | Japanese | ● | ● | ● | ● | |||||||
jv | Javanese | ● | ● | |||||||||
ka | Georgian | ● | ● | ● | ● | |||||||
kk | Kazakh | ● | ● | ● | ● | ● | ||||||
ko | Korean | ● | ● | ● | ● | |||||||
ky | Kyrgyz | ● | ● | |||||||||
la | Latin | ● | ● | |||||||||
lb | Luxembourgish | ● | ||||||||||
lt | Lithuanian | ● | ● | ● | ● | |||||||
lv | Latvian | ● | ● | ● | ● | |||||||
mk | Macedonian | ● | ● | ● | ||||||||
mn | Mongolian | ● (e) | ● | ● | ||||||||
mr | Marathi | ● | ● | |||||||||
ms | Malay | ● | ● | ● | ● | |||||||
mt | Maltese | ● | ● | ● | ||||||||
ne | Nepali | ● | ● | ● | ● | |||||||
nl | Dutch | ● (e) | ● | ● | ● | ● | ● | ● | ● | |||
no | Norwegian | ● | ● | ● | ● | |||||||
pl | Polish | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |
pt | Portuguese | ● (e) | ● | ● | ● | ● | ● | ● | ● | |||
ro | Romanian | ● (e) | ● | ● | ● | ● | ● | |||||
ru | Russian | ● | ● | ● | ● | ● | ● | ● | ● | |||
sk | Slovak | ● | ● | ● | ● | ● | ||||||
sl | Slovenian | ● (e) | ● | ● | ● | |||||||
sq | Albanian | ● | ● | ● | ● | |||||||
sr | Serbian | ● | ● | ● | ● | |||||||
sv | Swedish | ● | ● | ● | ● | ● | ● | ● | ||||
sw | Swahili | ● | ● | ● | ● | ● | ||||||
te | Telugu | ● | ● | ● | ||||||||
th | Thai | ● (e) | ● | ● | ● | |||||||
tl | Tagalog | ● | ● | ● | ||||||||
tn | Tswana | ● | ● | ● | ||||||||
tr | Turkish | ● (e) | ● | ● | ● | ● | ● | ● | ||||
tt | Tatar | ● | ● | ● | ● | |||||||
uk | Ukrainian | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||
uz | Uzbek | ● | ● | ● | ● | |||||||
vi | Vietnamese | ● | ● | ● | ● | ● | ||||||
yo | Yoruba | ● (e) | ● | ● | ● | |||||||
zh | Chinese | ● | ● | ● | ● | ● | ● |
(e) experimental, most likely doesn't work well
(*) Coqui TTS models are only available on x86-64
Language models can be downloaded directly from the app.
Details of models which are currently configured for download are described in models.json (GitHub) or models.json (GitLab).
Any contribution is very welcome!
Project is hosted both on GitHub and GitLab. Feel free to make a PR/MR, report an issue or reqest for new feature on the platform you prefer the most.
Translation files in Qt format are in translations dir (GitHub) or translations dir (GitLab).
Preferred way to contribute translation is via Transifex service, but if you would like to make a direct PR/MR, please do it.
- Linux Desktop: Flatpak
- Arch Linux (AUR): dsnote-git
- Sailfish OS: OpenRepos
Flatpak package (published on Flathub) includes almost all the dependencies needed to run every feature of the application. This includes CUDA, ROCm, Torch and Python libraries. Due to this, the size of the package and the space required after installation are significant. If you don't need all the functionalities, you can use much smaller "Tiny" package (available on Releases page), which contains only the basic features.
Comparison between "Flathub" and "Tiny" Flatpak packages:
Feature | Flathub | Tiny |
---|---|---|
Coqui/DeepSpeech STT | ✔️ | ✔️ |
Vosk STT | ✔️ | ✔️ |
Whisper STT | ✔️ | ✔️ |
Whisper STT GPU | ✔️ | ✘ |
Faster Whisper STT | ✔️ | ✘ |
April-ASR STT | ✔️ | ✔️ |
eSpeak TTS | ✔️ | ✔️ |
MBROLA TTS | ✔️ | ✔️ |
Piper TTS | ✔️ | ✔️ |
RHVoice TTS | ✔️ | ✔️ |
Coqui TTS | ✔️ | ✘ |
Mimic3 TTS | ✔️ | ✘ |
Punctuation restoration | ✔️ | ✘ |
Translator | ✔️ | ✔️ |
It is also possible to build and install the latest development (git) or latest stable (release) version from the repository using the provided PKGBUILD file (please note that the same remarks about building on Linux apply):
git clone <git repository url>
cd dsnote/arch/git # build latest git version
# or
cd dsnote/arch/release # build latest release version
makepkg -si
git clone <git repository url>
cd dsnote/flatpak
flatpak-builder --user --install-deps-from=flathub --repo="/path/to/local/flatpak/repo" "/path/to/output/dir" net.mkiol.SpeechNote.yaml
git clone <git repository url>
cd dsnote
mkdir build
cd build
sfdk config --session specfile=../sfos/harbour-dsnote.spec
sfdk config --session target=SailfishOS-4.4.0.58-aarch64
sfdk cmake ../ -DCMAKE_BUILD_TYPE=Release -DWITH_SFOS=ON -DWITH_PY=OFF
sfdk package
Speech Note has many build-time and run-time dependencies. This includes shared and static libraries, 3rd-party executables, Python and Perl scripts. Because of these complexity, the recommended way to build is to use Flatpak tool-chain (Flatpak manifest file and flatpak-builder). If you want to make a direct build (i.e. without flatpak) it is also possible but more complicated.
git clone <git repository url>
cd dsnote
mkdir build
cd build
cmake ../ -DCMAKE_BUILD_TYPE=Release -DWITH_DESKTOP=ON
make
To make build without support for Python components, add -DWITH_PY=OFF
in cmake step.
To see other build options search for option(BUILD_XXX)
in CMakeList.txt
file.
Speech Note relies on following open source projects:
- Qt
- Coqui STT
- Coqui TTS
- Vosk
- whisper.cpp
- WebRTC VAD
- libarchive
- RNNoise-nu
- {fmt}
- Hugging Face Transformers
- Piper
- RHVoice
- ssplit-cpp
- espeak-ng
- bergamot-translator
- Rubber Band Library
- simdjson
- uroman
- astrunc
- FFmpeg
- LAME
- Vorbis
- TagLib
- libnumbertext
- KDBusAddons
- QHotkey
- faster-whisper
- Mimic 3
- Unikud
- april-asr
- Opus
- Screenshots
- Speech Note video demo (Speech Note 4.0)
- Translator feature video demo (Speech Note 4.0)
- Translator feature video demo on Sailfish OS (Speech Note 4.0)
- Translator feature video demo on PinePhone (Speech Note 4.0)
- DebugPoint.com (Speech Note 4.0)
- DebugPoint.com video (Speech Note 4.0)
- OMG! Linux (Speech Note 4.0)
- LinuxLinks (Speech Note 4.0)
- The Linux Cast video (Speech Note 4.0)
- CONNECTwww.com (Speech Note 4.0)
- ZDNET (Speech Note 4.2)
Speech Note is an open source project. Source code is released under the Mozilla Public License Version 2.0.
3rd party libraries:
- Coqui STT, released under the Mozilla Public License Version 2.0
- Coqui TTS, released under the Mozilla Public License Version 2.0
- Vosk API, released uder the Apache License 2.0
- whisper.cpp, released under the MIT License
- WebRTC, released under this license
- libarchive, released under the BSD License
- RNNoise-nu, released under the BSD 3-Clause License
- {fmt}, released uder this license
- Hugging Face Transformers, released under the Apache License 2.0
- Piper, released under the MIT License
- RHVoice, released under the GNU General Public License v2.0
- ssplit-cpp, released under the Apache License 2.0
- espeak-ng, released under the GNU General Public License v3.0
- bergamot-translator, released under the Mozilla Public License 2.0
- Rubber Band Library, released under the GNU General Public License (version 2 or later)
- simdjson, released under the Apache License 2.0
- uroman, released under this license
- astrunc, released under the MIT License
- FFmpeg, released under the GNU Lesser General Public License version 2.1 or later
- LAME, released under the LGPL
- Vorbis, released under this license
- TagLib, released under the GNU Lesser General Public License (LGPL) and Mozilla Public License (MPL)
- libnumbertext, released under the BSD License
- KDBusAddons, released under the LGPL licenses
- QHotkey, released under the BSD-3-Clause License
- faster-whisper, released under the MIT License
- Mimic 3, released under the AGPL-3.0 license
- Unikud, released under the MIT License
- april-asr, released under the GNU General Public License v3.0
- libopus, released under this license
The files in the directory nonbreaking_prefixes
were copied from
mosesdecoder project and distributed under the
GNU Lesser General Public License v2.1.