OpenAI's voice model with GPT-5-class reasoning, parallel tool calls and a 128K-token context window, available via the Realtime API.
Context window
128K
tokens
Release date
7 May 2026
Access:APIDeployment:☁ Cloud
Overview
Access & deployment
API
Cloud
Weights: Closed
Key parameters
📏 Context: 128K
✓ Tools
📥 Input: audio, text
Technical specification
Context window
128K
tokens
Features:✓ Tool use
Modalities
⬇ Input
audiotext
⬆ Output
audiotext
Capabilities and applications
Native model capabilities
Audio understanding
Category: audio
Voice Conversation
Ability to conduct multi-turn real-time voice conversations with context retention and natural speech pacing.
Category: speech
Live Translation
Real-time speech translation between multiple languages without interrupting the audio stream.
Category: speech
Streaming Speech-to-Text
Real-time conversion of speech to text with immediate output as the speaker is talking.
Category: speech
Parallel Tool Calls
Ability to invoke multiple external tools simultaneously while generating a response.
Category: reasoning
Benchmark results
2 benchmarks
Big Bench Audio
relative improvement · GPT-Realtime-2 (high)
+15.2% vs GPT-Realtime-1.5%
📄 OpenAI
Audio MultiChallenge
relative improvement · GPT-Realtime-2 (xhigh)
+13.8% vs GPT-Realtime-1.5%
📄 OpenAI
Technical architecture
Core Architecture
