Clank is a voice-controlled LED automation project that combines speech recognition, local AI models, and ESP32-controlled hardware. Built on top of the Moonshine speech recognition system, it enables voice-activated control of LED lights through natural language commands.
Screencast_20241123_181801.webm
Clank allows users to control LED lights through simple spoken commands. The system flow is:
- User Speech: Audio is captured via the default microphone using sounddevice
- Voice Activity Detection: Using Silero VAD to detect speech segments
- Transcription: Speech is transcribed into text using Moonshine's speech recognition model
- AI Processing: The text is sent to a locally hosted LLM (running at
127.0.0.1:5000
) for interpretation - LED Control: The LLM returns structured JSON output which will be used to control LEDs via ESP32 GPIOs
The project has achieved several key milestones:
- Speech Recognition: Successfully implemented using Moonshine's ONNX models
- Voice Activity Detection: Integrated Silero VAD for accurate speech detection
- Command Processing: LLM successfully generates structured JSON responses for LED control
- Example Response:
{ "action": "led_control", "parameters": { "color": "blue", "state": "on", "brightness": 50 } }
- Audio Capture: Uses sounddevice for real-time audio input
- Speech Detection: Silero VAD for precise voice activity detection
- Speech Recognition: Moonshine-powered transcription
- Command Processing: Local LLM interpretation with structured JSON output
- ESP32 Integration: Development of firmware to receive and process LLM commands
- LED Control: GPIO management for LED state and brightness control
- Extended Hardware Control: Support for multiple LED arrays
- Advanced Voice Commands: More complex lighting patterns and scenes
- Web Interface: Configuration and monitoring dashboard
-
Set Required Environment Variable:
export KERAS_BACKEND=torch
-
Clone the Repository:
git clone https://github.com/cycloarcane/clank.git cd clank
-
Install Python Dependencies:
pip install -r requirements.txt
-
Configure Local LLM: Ensure your local LLM server is running at
127.0.0.1:5000
Run the voice control script:
python voice_LED_control.py
Available voice commands:
- "Computer turn on red LED"
- "Computer set blue LED to 50%"
- "Computer turn off green LED"
voice_LED_control.py
: Main script for voice capture and processingonnx_model.py
: Moonshine model wrapper for speech recognitionrequirements.txt
: Python dependenciesREADME.md
: Project documentation
This project heavily builds upon the Moonshine speech recognition system and their live_captions demo. Special thanks to the Moonshine team:
@misc{jeffries2024moonshinespeechrecognitionlive,
title={Moonshine: Speech Recognition for Live Transcription and Voice Commands},
author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden},
year={2024},
eprint={2410.15608},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2410.15608},
}
Contributions are welcome! If you'd like to help build Clank, please submit a pull request or open an issue for any feature requests or bug fixes.
For questions or support:
- Email: [email protected]
- GitHub: cycloarcane
This project is licensed under a modified non-commercial GNU 3.0 license.
Join us in building the future of voice-controlled lighting! 🎤💡