This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API. In particular, this demonstrates:
- Sequential agent handoffs according to a defined agent graph (taking inspiration from OpenAI Swarm)
- Background escalation to more intelligent models like o4-mini for high-stakes decisions
- Prompting models to follow a state machine, for example to accurately collect things like names and phone numbers with confirmation character by character to authenticate a user.
Here's a quick demo video if you'd like a walkthrough. You should be able to use this repo to prototype your own multi-agent realtime voice app in less than 20 minutes!
- This is a Next.js typescript app
- Install dependencies with
npm i
- Add your
OPENAI_API_KEY
to your env. Either add it to your.bash_profile
or equivalent file, or copy.env.sample
to.env
and add it there. - Start the server with
npm run dev
- Open your browser to http://localhost:3000 to see the app. It should automatically connect to the
simpleExample
Agent Set.
Configuration in src/app/agentConfigs/simpleExample.ts
import { AgentConfig } from "@/app/types";
import { injectTransferTools } from "./utils";
// Define agents
const haikuWriter: AgentConfig = {
name: "haikuWriter",
publicDescription: "Agent that writes haikus.", // Context for the agent_transfer tool
instructions:
"Ask the user for a topic, then reply with a haiku about that topic.",
tools: [],
};
const greeter: AgentConfig = {
name: "greeter",
publicDescription: "Agent that greets the user.",
instructions:
"Please greet the user and ask them if they'd like a Haiku. If yes, transfer them to the 'haiku' agent.",
tools: [],
downstreamAgents: [haikuWriter],
};
// add the transfer tool to point to downstreamAgents
const agents = injectTransferTools([greeter, haikuWriter]);
export default agents;
This fully specifies the agent set that was used in the interaction shown in the screenshot above.
This diagram illustrates the interaction flow defined in src/app/agentConfigs/customerServiceRetail/
.
Show CustomerServiceRetail Flow Diagram
sequenceDiagram
participant User
participant WebClient as Next.js Client
participant NextAPI as /api/session
participant RealtimeAPI as OpenAI Realtime API
participant AgentManager as Agents (authentication, returns, sales, simulatedHuman)
participant o1mini as "o4-mini" (Escalation Model)
Note over WebClient: User navigates to ?agentConfig=customerServiceRetail
User->>WebClient: Open Page
WebClient->>NextAPI: GET /api/session
NextAPI->>RealtimeAPI: POST /v1/realtime/sessions
RealtimeAPI->>NextAPI: Returns ephemeral session
NextAPI->>WebClient: Returns ephemeral token (JSON)
Note right of WebClient: Start RTC handshake
WebClient->>RealtimeAPI: Offer SDP (WebRTC)
RealtimeAPI->>WebClient: SDP answer
WebClient->>WebClient: DataChannel "oai-events" established
Note over AgentManager: Default agent is "authentication"
User->>WebClient: "Hi, I'd like to return my snowboard."
WebClient->>AgentManager: conversation.item.create (role=user)
WebClient->>RealtimeAPI: {type: "conversation.item.create"}
WebClient->>RealtimeAPI: {type: "response.create"}
authentication->>AgentManager: Requests user info, calls authenticate_user_information()
AgentManager-->>WebClient: function_call => name="authenticate_user_information"
WebClient->>WebClient: handleFunctionCall => verifies details
Note over AgentManager: After user is authenticated
authentication->>AgentManager: transferAgents("returns")
AgentManager-->>WebClient: function_call => name="transferAgents" args={ destination: "returns" }
WebClient->>WebClient: setSelectedAgentName("returns")
Note over returns: The user wants to process a return
returns->>AgentManager: function_call => checkEligibilityAndPossiblyInitiateReturn
AgentManager-->>WebClient: function_call => name="checkEligibilityAndPossiblyInitiateReturn"
Note over WebClient: The WebClient calls /api/chat/completions with model="o4-mini"
WebClient->>o1mini: "Is this item eligible for return?"
o1mini->>WebClient: "Yes/No (plus notes)"
Note right of returns: Returns uses the result from "o4-mini"
returns->>AgentManager: "Return is approved" or "Return is denied"
AgentManager->>WebClient: conversation.item.create (assistant role)
WebClient->>User: Displays final verdict
- Check out the configs in
src/app/agentConfigs
. The example above is a minimal demo that illustrates the core concepts. - frontDeskAuthentication Guides the user through a step-by-step authentication flow, confirming each value character-by-character, authenticates the user with a tool call, and then transfers to another agent. Note that the second agent is intentionally "bored" to show how to prompt for personality and tone.
- customerServiceRetail Also guides through an authentication flow, reads a long offer from a canned script verbatim, and then walks through a complex return flow which requires looking up orders and policies, gathering user context, and checking with
o4-mini
to ensure the return is eligible. To test this flow, say that you'd like to return your snowboard and go through the necessary prompts!
- You can copy these to make your own multi-agent voice app! Once you make a new agent set config, add it to
src/app/agentConfigs/index.ts
and you should be able to select it in the UI in the "Scenario" dropdown menu. - To see how to define tools and toolLogic, including a background LLM call, see src/app/agentConfigs/customerServiceRetail/returns.ts
- To see how to define a detailed personality and tone, and use a prompt state machine to collect user information step by step, see src/app/agentConfigs/frontDeskAuthentication/authentication.ts
- To see how to wire up Agents into a single Agent Set, see src/app/agentConfigs/frontDeskAuthentication/index.ts
- If you want help creating your own prompt using these conventions, we've included a metaprompt here, or you can use our Voice Agent Metaprompter GPT
Assistant messages are checked for safety and compliance using a guardrail function before being finalized in the transcript. This is implemented in src/app/hooks/useHandleServerEvent.ts
as the processGuardrail
function, which is invoked on each assistant message to run a moderation/classification check. You can review or customize this logic by editing the processGuardrail
function definition and its invocation inside useHandleServerEvent
.
- You can select agent scenarios in the Scenario dropdown, and automatically switch to a specific agent with the Agent dropdown.
- The conversation transcript is on the left, including tool calls, tool call responses, and agent changes. Click to expand non-message elements.
- The event log is on the right, showing both client and server events. Click to see the full payload.
- On the bottom, you can disconnect, toggle between automated voice-activity detection or PTT, turn off audio playback, and toggle logs.