Tinkering with text-to-speech (TTS) and how to get it to sound more like a human reader, who has to take breaths while speaking.
The intention for this code is to act a basis for a series of voice-based art projects demonstrating various ways voice can be incorporated into an interactive art project.
A general problem to consider: A huge volume of information. Hoe do you find what you need?
Smart indexes, including the bookshelf. book and article metadata for small indexes.
Use the gutenberg metadata files directly, as per https://github.com/openzim/gutenberg/blob/master/gutenbergtozim/rdf.py
A Makefile
is included for common tasks.
% make
Should build an VENV (in /.env
), install the requirements and start running the main entry.
Current implementation, 0.0.2, uses vosk and flite
Next steps:
- Logging
- TTS that supports SSML - done - added a wrapper for flite
- Clean up pauses and text buffering - better w/ flite.
- Clean up empty ASR responses.
- Add voice menu for games
Started testing on Raspberry PI. flite and zmachine are working, breaks without mic.
On raspi, need:
sudo apt install flite python3-pyaudio python3-soundfile
pip install -r requirements.txt
pip install https://github.com/alphacep/vosk-api/releases/download/v0.3.42/vosk-0.3.42-py3-none-linux_aarch64.whl
Some the of the specific ideas:
- A magical storyteller trapped in a mundane object, that can tell a story.
- Add simple controls of some kind, and a minimal display to select from many stories.
- Add voice recognition to:
- Search for stories
- Provide 'fact' evaluation (answer queries)
- Interactive fiction
Challenges:
- speech: need to think about human speech and pauses and breathing, and build an SSML version to reflect.
- could take advantage of mutiple voices on higher-end systems.
- ASR, high-quality voices and big knowledges bases don't fit on microelectronics.
Simple models include:
- Low-quality TTS with or w/o simple controls (my kindle could do that, but for story-ssml!) for feathers, picos, etc.
- Mid-quality ASR and TTS on a RasPi4 or CM4.
- High-quality TTS using cloud. ASR?
For now, focus on all-on-one RasPi running ASR -> IF7 -> TTS
- Smallest
- Fastest
- Performant on RP4
- Service overview (local vs cloud)
Generate SSML from text, html and markdown.
Source -> SSML -> Engine -> Audio
What engines support SSML?
Future of SSML: https://github.com/w3c/pronunciation/
Scrapes a few categories in Gutenburg to generate a ZIM of all those stories. Focused on small/lean.
Starting catagories: https://www.gutenberg.org/ebooks/bookshelf/20 https://www.gutenberg.org/ebooks/bookshelf/18 https://www.gutenberg.org/ebooks/bookshelf/216 https://www.gutenberg.org/ebooks/bookshelf/17 https://www.gutenberg.org/ebooks/bookshelf/218 https://www.gutenberg.org/ebooks/bookshelf/213
- Simple voice query (vosk)
- Local search for matching data (grep in py?)
- Present results as voice menu
- read selected story.
- al.h -
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks/OpenAL.framework/Headers
forked from https://github.com/ravdin/zmachine
Python 3 implementation of a z-machine interpreter, for playing Infocom games. To play a z-machine file:
python zmachine [GAME_FILE]
The interpreter supports versions 3 and 4 (version 4 games include Trinity, AMFV, and Bureaucracy). Save files are in Quetzal format and should be compatible with the Frotz interpreter.
The original Zork trilogy (written by Tim Anderson, Marc Blank, Bruce Daniels, and Dave Lebling) is in the games
directory.
The Z-Machine Standards Document, by Graham Nelson, is an indispensable guide for decoding the z-machine instructions.
This space intentionally left blank.