Skip to content

openai/openai-realtime-agents

Repository files navigation

Realtime API Agents Demo

This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API. In particular, this demonstrates:

  • Sequential agent handoffs according to a defined agent graph (taking inspiration from OpenAI Swarm)
  • Background escalation to more intelligent models like o4-mini for high-stakes decisions
  • Prompting models to follow a state machine, for example to accurately collect things like names and phone numbers with confirmation character by character to authenticate a user.

Here's a quick demo video if you'd like a walkthrough. You should be able to use this repo to prototype your own multi-agent realtime voice app in less than 20 minutes!

Screenshot of the Realtime API Agents Demo

Setup

  • This is a Next.js typescript app
  • Install dependencies with npm i
  • Add your OPENAI_API_KEY to your env. Either add it to your .bash_profile or equivalent file, or copy .env.sample to .env and add it there.
  • Start the server with npm run dev
  • Open your browser to http://localhost:3000 to see the app. It should automatically connect to the simpleExample Agent Set.

Configuring Agents

Configuration in src/app/agentConfigs/simpleExample.ts

import { AgentConfig } from "@/app/types";
import { injectTransferTools } from "./utils";

// Define agents
const haikuWriter: AgentConfig = {
  name: "haikuWriter",
  publicDescription: "Agent that writes haikus.", // Context for the agent_transfer tool
  instructions:
    "Ask the user for a topic, then reply with a haiku about that topic.",
  tools: [],
};

const greeter: AgentConfig = {
  name: "greeter",
  publicDescription: "Agent that greets the user.",
  instructions:
    "Please greet the user and ask them if they'd like a Haiku. If yes, transfer them to the 'haiku' agent.",
  tools: [],
  downstreamAgents: [haikuWriter],
};

// add the transfer tool to point to downstreamAgents
const agents = injectTransferTools([greeter, haikuWriter]);

export default agents;

This fully specifies the agent set that was used in the interaction shown in the screenshot above.

Sequence Diagram of CustomerServiceRetail Flow

This diagram illustrates the interaction flow defined in src/app/agentConfigs/customerServiceRetail/.

Show CustomerServiceRetail Flow Diagram
sequenceDiagram
    participant User
    participant WebClient as Next.js Client
    participant NextAPI as /api/session
    participant RealtimeAPI as OpenAI Realtime API
    participant AgentManager as Agents (authentication, returns, sales, simulatedHuman)
    participant o1mini as "o4-mini" (Escalation Model)

    Note over WebClient: User navigates to ?agentConfig=customerServiceRetail
    User->>WebClient: Open Page
    WebClient->>NextAPI: GET /api/session
    NextAPI->>RealtimeAPI: POST /v1/realtime/sessions
    RealtimeAPI->>NextAPI: Returns ephemeral session
    NextAPI->>WebClient: Returns ephemeral token (JSON)

    Note right of WebClient: Start RTC handshake
    WebClient->>RealtimeAPI: Offer SDP (WebRTC)
    RealtimeAPI->>WebClient: SDP answer
    WebClient->>WebClient: DataChannel "oai-events" established

    Note over AgentManager: Default agent is "authentication"
    User->>WebClient: "Hi, I'd like to return my snowboard."
    WebClient->>AgentManager: conversation.item.create (role=user)
    WebClient->>RealtimeAPI: {type: "conversation.item.create"}
    WebClient->>RealtimeAPI: {type: "response.create"}

    authentication->>AgentManager: Requests user info, calls authenticate_user_information()
    AgentManager-->>WebClient: function_call => name="authenticate_user_information"
    WebClient->>WebClient: handleFunctionCall => verifies details

    Note over AgentManager: After user is authenticated
    authentication->>AgentManager: transferAgents("returns")
    AgentManager-->>WebClient: function_call => name="transferAgents" args={ destination: "returns" }
    WebClient->>WebClient: setSelectedAgentName("returns")

    Note over returns: The user wants to process a return
    returns->>AgentManager: function_call => checkEligibilityAndPossiblyInitiateReturn
    AgentManager-->>WebClient: function_call => name="checkEligibilityAndPossiblyInitiateReturn"

    Note over WebClient: The WebClient calls /api/chat/completions with model="o4-mini"
    WebClient->>o1mini: "Is this item eligible for return?"
    o1mini->>WebClient: "Yes/No (plus notes)"

    Note right of returns: Returns uses the result from "o4-mini"
    returns->>AgentManager: "Return is approved" or "Return is denied"
    AgentManager->>WebClient: conversation.item.create (assistant role)
    WebClient->>User: Displays final verdict
Loading

Next steps

  • Check out the configs in src/app/agentConfigs. The example above is a minimal demo that illustrates the core concepts.
  • frontDeskAuthentication Guides the user through a step-by-step authentication flow, confirming each value character-by-character, authenticates the user with a tool call, and then transfers to another agent. Note that the second agent is intentionally "bored" to show how to prompt for personality and tone.
  • customerServiceRetail Also guides through an authentication flow, reads a long offer from a canned script verbatim, and then walks through a complex return flow which requires looking up orders and policies, gathering user context, and checking with o4-mini to ensure the return is eligible. To test this flow, say that you'd like to return your snowboard and go through the necessary prompts!

Defining your own agents

Customizing Output Guardrails

Assistant messages are checked for safety and compliance using a guardrail function before being finalized in the transcript. This is implemented in src/app/hooks/useHandleServerEvent.ts as the processGuardrail function, which is invoked on each assistant message to run a moderation/classification check. You can review or customize this logic by editing the processGuardrail function definition and its invocation inside useHandleServerEvent.

UI

  • You can select agent scenarios in the Scenario dropdown, and automatically switch to a specific agent with the Agent dropdown.
  • The conversation transcript is on the left, including tool calls, tool call responses, and agent changes. Click to expand non-message elements.
  • The event log is on the right, showing both client and server events. Click to see the full payload.
  • On the bottom, you can disconnect, toggle between automated voice-activity detection or PTT, turn off audio playback, and toggle logs.

Core Contributors

About

This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages