Swift is a fast AI voice assistant.
- Groq is used for fast inference of OpenAI Whisper (for transcription) and Meta Llama 3 (for generating the text response).
- Cartesia's Sonic voice model is used for fast speech synthesis, which is streamed to the frontend.
- VAD is used to detect when the user is talking, and run callbacks on speech segments.
- The app is a Next.js project written in TypeScript and deployed to Vercel.
Thank you to the teams at Groq and Cartesia for providing access to their APIs for this demo!
- Clone the repository
- Create a
.env.local
file with:GROQ_API_KEY
from console.groq.com.CARTESIA_API_KEY
from play.cartesia.ai.
- Run
pnpm install
to install dependencies. - Run
pnpm dev
to start the development server.