VisionSynth is an innovative computer vision-controlled synthesizer that lets you create music with hand gestures. Using your webcam, it tracks your hand movements and converts them into MIDI signals, allowing for an intuitive and expressive musical performance.
- 🎹 Real-time hand gesture to MIDI conversion
- 👐 Two-hand support:
- Right hand: Direct pitch and volume control using pentatonic scale
- Left hand: Arpeggiator with major triad patterns
- 🎭 Gesture recognition:
- Open hand position controls notes
- Fist gesture for sequence control
- 🎛️ Intuitive controls:
- Y-axis: Pitch (pentatonic scale)
- X-axis: Volume
- 🎯 Fixed 100 BPM for rhythmic stability (could be changed to a variable BPM)
- 🎼 MIDI output through IAC Driver for DAW integration
- Python 3.8+
- Node.js 16+
- macOS (for IAC Driver support)
- Webcam
- DAW (Digital Audio Workstation) that supports MIDI input
- Clone the repository:
git clone https://github.com/egecam/vision-synth.git
cd vision-synth
- Set up the backend:
cd backend
python -m venv venv
source venv/bin/activate # On Windows: .\venv\Scripts\activate
pip install -r requirements.txt
- Set up the frontend:
cd frontend
npm install
- Configure IAC Driver:
- Open Audio MIDI Setup (Applications > Utilities)
- Window > Show MIDI Studio
- Double-click on IAC Driver
- Ensure 'Device is online' is checked
- Add at least one port if none exists
- Start the backend server:
cd backend
uvicorn app.main:app --reload --port 8000
- Start the frontend development server:
cd frontend
npm run dev
-
Open your browser and navigate to
http://localhost:3000
-
Configure your DAW to receive MIDI input from the IAC Driver
-
Click "Initialize Audio" in the web interface
-
Use hand gestures to create music:
- Right hand: Move vertically for pitch, horizontally for volume
- Left hand: Same controls but plays arpeggios, make a fist to stop sequence
The project is structured as follows:
vision-synth/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI server and hand tracking
│ │ └── synth_module.py # MIDI synthesis and hand processing
│ └── requirements.txt
└── frontend/
├── src/
│ ├── components/
│ │ ├── HandTracker.tsx # Main component with webcam and audio
│ │ └── SynthEngine.ts # Audio processing
│ └── app/
└── package.json
We welcome contributions! Please see our Contributing Guidelines for details.
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
- Built with FastAPI, MediaPipe, and Next.js
- Uses Web MIDI API and WebSocket for real-time communication
- Inspired by theremin and gesture-controlled instruments