GPT Realtime Whisper

gpt-realtime-whisper · Family: GPT

OpenAI streaming speech-to-text model for low-latency realtime transcription, served via the Realtime transcription API.

✓ Active✓ Public accessAudioAudio📁 GPT

Context window

16K tokens

tokens

Max output

2,000

tokens

🏢OpenAIProducer

Access:APIDeployment:☁ Cloud

Overview

GPT-Realtime-Whisper is a specialized OpenAI streaming speech-to-text model designed for realtime applications that require low-latency transcript deltas emitted while the speaker is still talking. The model lets developers tune the trade-off between latency and transcription accuracy.

Transcription sessions use a session type of 'transcription' and support both WebSocket transport (24 kHz mono PCM, base64-encoded) and WebRTC. Server VAD (voice activity detection) can be configured with threshold, prefix_padding_ms, and silence_duration_ms parameters, or audio buffers can be committed manually. The model emits conversation.item.input_audio_transcription.delta and .completed events.

Typical use cases include live captions, meeting transcription, lecture capture, telephony transcription, broadcast captioning, and dictation. Pricing is based on audio duration (USD per minute) rather than tokens. Within OpenAI's transcription model family, it is positioned as a natively streaming alternative to GPT-4o Transcribe, GPT-4o mini Transcribe, and Whisper-1.

Classification

AudioAudio

Family: GPT

Access & deployment

API

Cloud

Weights: Closed

Key parameters

📏 Context: 16K tokens

📥 Input: audio, text

Technical specification

Context window

16K tokens

tokens

Max output tokens

2,000

tokens per response

Knowledge cutoff

30 Sept 2024

Knowledge boundary

Modalities

⬇ Input

audiotext

⬆ Output

text

Capabilities and applications

Native model capabilities

Streaming Speech-to-Text

Real-time conversion of speech to text with immediate output as the speaker is talking.

Category: speech

Technical architecture

Core Architecture

NMNative Multimodal

Articles

1 article

OpenAI Launches GPT-Realtime-2: Voice Intelligence with GPT-5-Class Reasoning

9 May 2026

›

Sources and related pages

3 sources

Docsgpt-realtime-whisper – Model documentationplatform.openai.com DocsRealtime transcription – OpenAI API guideplatform.openai.com WebOpenAI Models catalogplatform.openai.com

Browse related topics

📁 GPT 🧠 Native Multimodal All audio model models All speech model models