Robots Atlas>ROBOTS ATLAS
GPT Realtime 2

GPT Realtime 2

2 · Family: GPT
OpenAI's voice model with GPT-5-class reasoning, parallel tool calls and a 128K-token context window, available via the Realtime API.
✓ Active✓ Public accessAudioAudioMultimodalReasoning model📁 GPT
Context window
128K
tokens
Release date
7 May 2026
Access:APIDeployment:☁ Cloud

Overview

GPT-Realtime-2 is a next-generation audio model released by OpenAI on May 7, 2026, as part of the Realtime API. It combines GPT-5-class reasoning, parallel tool calls, and a context window expanded to 128K tokens (up from 32K in the previous version). A new "preamble" feature lets the model speak short acknowledgement phrases ("let me check that", "one moment") before generating a full response, along with audible announcements of tool calls in progress.

On OpenAI benchmarks, GPT-Realtime-2 (high) scores 15.2% higher than its predecessor GPT-Realtime-1.5 on Big Bench Audio (audio reasoning) and 13.8% higher on Audio MultiChallenge (multi-turn conversation). Early tester Zillow reported a 26-point increase in call success rate (95% vs. 69%) after prompt optimization. The model is accessible via WebRTC, WebSocket, and SIP, with full EU Data Residency support.

Classification
AudioAudioMultimodalReasoning model
Family: GPT
Access & deployment
API
Cloud
Weights: Closed
Key parameters
📏 Context: 128K
Tools
📥 Input: audio, text

Technical specification

Context window
128K
tokens
Features:Tool use
Modalities
⬇ Input
audiotext
⬆ Output
audiotext

Capabilities and applications

Native model capabilities
Audio understanding
Category: audio
Voice Conversation
Ability to conduct multi-turn real-time voice conversations with context retention and natural speech pacing.
Category: speech
Live Translation
Real-time speech translation between multiple languages without interrupting the audio stream.
Category: speech
Streaming Speech-to-Text
Real-time conversion of speech to text with immediate output as the speaker is talking.
Category: speech
Parallel Tool Calls
Ability to invoke multiple external tools simultaneously while generating a response.
Category: reasoning

Benchmark results

2 benchmarks
Big Bench Audio
relative improvement · GPT-Realtime-2 (high)
+15.2% vs GPT-Realtime-1.5%
📄 OpenAI
Audio MultiChallenge
relative improvement · GPT-Realtime-2 (xhigh)
+13.8% vs GPT-Realtime-1.5%
📄 OpenAI