Skip to content

Commit

Permalink
New Models
Browse files Browse the repository at this point in the history
  • Loading branch information
dnhkng committed Dec 11, 2023
1 parent 68be59d commit e7ca4ad
Show file tree
Hide file tree
Showing 22 changed files with 3,375 additions and 469 deletions.
1 change: 0 additions & 1 deletion .env_example

This file was deleted.

163 changes: 162 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,163 @@
**/__pycache__/**
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

# The large model files
*.bin
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2022 Timmy Knight
Copyright (c) 2022 David Ng

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
35 changes: 25 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,40 @@
# GLaDOS Personality Core

Automatic Speech Recognition from OpenAI Whisper
Text-to-Speech Engine based on Tacotron 2 and Wavenet vocoder.
"Brain" uses Davinci Text 03 from OpenAI.
This is a project dedicated to building a real-life version of GLaDOS.

This works as stand-alone.
```console
python3 glados.py
```
*That being a hardware and software project that will create an aware, interactive, and embodied GLaDOS.*

The wake-word is 'Glados', but this can be changed in the class variables.
Note: 'Glados' seems a hard word to accurately detect! What's nice is that the wake word does not have to be at the beginning; she will respond if you mention her name at any time!
This will entail:
- [x] Train GLaDOS voice generator
- [x] Generate prompt that leads to a realistic "Personality Core"
- [ ] Generate a [MemGPT](https://memgpt.readthedocs.io/en/latest/) medium- and long-term memory for GLaDOS
- [ ] Give GLaDOS vision via [LLaVA](https://llava-vl.github.io/)
- [ ] Create 3D-printable parts
- [ ] Design the animatronics system



## Sofware Architecture
The initial goals are to develop a low-latency platform, where GLaDOS can respond to voice interations within 600ms.

To do this, the system contantly record data to a circular buffer, waiting for [voice to be detected](https://github.com/snakers4/silero-vad). When it's determined that the voice has stopped (including detection of normal pauses), it will be [transcribed quickly](https://github.com/huggingface/distil-whisper). This is then passed to a streaming [local Large Language Model](https://github.com/ggerganov/llama.cpp), where the streamed text is broken by sentence, and passed to a [text-to-speech system](https://github.com/rhasspy/piper). This means futher sentences can be generated while the current is playing, reducing latency substantially.

### Subgoals
- The another aim of the project is to minimise dependencies, so this can run on contrained hardware. That means no PyTorch or other large packages.
- As I want to fully understand the system, I have removed large amount of redirection: that means extracting and rewriting code. i.e. as GLaDOS only speaks English, I have rewritten the wrapper around [espeak](https://espeak.sourceforge.net/) and the entire Text-to-Speech subsystem is about 500 LOC and has only 3 dependencies: numpy, onnxruntime, and sounddevice.


## Installation Instruction
If you want to install the TTS Engine on your machine, please follow the steps
below.
below. This has only been tested on Linux, but I wthink it will work on Windows with small tweaks.

1. Install the [`espeak`](https://github.com/espeak-ng/espeak-ng) synthesizer
according to the [installation
instructions](https://github.com/espeak-ng/espeak-ng/blob/master/docs/guide.md)
for your operating system.
2. Install the required Python packages, e.g., by running `pip install -r
requirements.txt`
3. For voice recognition, install [Whisper.cpp](https://github.com/ggerganov/whisper.cpp), and after compiling, mode the "libwhisper.so" file to the "glados" folder or add it to your path. For Windows, check out the discussion in my [whisper pull request](https://github.com/ggerganov/whisper.cpp/pull/1524). Then download the [voice recognition model](https://huggingface.co/distil-whisper/distil-medium.en/resolve/main/ggml-medium-32-2.en.bin?download=true), and put it tin the "models" directory.

## Testing
You can test the systems by exploring the 'demo.ipynb'.
110 changes: 110 additions & 0 deletions demo.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Demo the Text-to-Speech module"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import glados.tts as tts\n",
"import sounddevice as sd"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Instantiate the TTS engine\n",
"glados_tts = tts.TTSEngine()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Generate the audio.\n",
"# Glados is spelt incorrectly on purpose to make the pronunciation more accurate.\n",
"audio = glados_tts.generate_speech_audio(\"Hello, my name is Gladohs. I am an AI created by Aperture Science.\")\n",
"\n",
"# Play the audio\n",
"sd.play(audio, tts.RATE)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Demo the Automatic speech recogntion system\n",
"This will detect and transcribe your voice. In this demo, it will then get GlaDOS to repeat back to you what was heard."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import glados.voice_recognition as vr"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def say_text(text: str):\n",
" \"\"\"Say text using text-to-speech engine\n",
" \"\"\"\n",
" audio = glados_tts.generate_speech_audio(text)\n",
" sd.play(audio, tts.RATE)\n",
" sd.wait()\n",
"\n",
"# Instantiate VoiceRecognition class with the say_text function\n",
"demo = vr.VoiceRecognition(function=say_text)\n",
"\n",
"# Start the demo\n",
"demo.start()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "GLaDOS",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading

0 comments on commit e7ca4ad

Please sign in to comment.