Robots Atlas>ROBOTS ATLAS
GPT Realtime Translate

GPT Realtime Translate

gpt-realtime-translate · Family: GPT
Streaming speech-to-speech translation model served via OpenAI's dedicated Realtime translation endpoint for live multilingual audio.
✓ Active✓ Public accessAudioAudio📁 GPT
Context window
16K tokens
tokens
Max output
2,000
tokens
Access:APIDeployment:☁ Cloud

Overview

GPT-Realtime-Translate is a specialized OpenAI model for streaming speech-to-speech translation, released as part of the Realtime API family. It ingests source audio and returns translated audio plus transcript deltas while the speaker is still talking, supporting both source and target transcripts.

Translation sessions use a dedicated /v1/realtime/translations endpoint rather than the standard /v1/realtime endpoint used by voice agents. The architecture is built around continuous audio streaming (no response.create calls), with the model acting as an interpreter rather than an assistant. Both WebRTC (media track transport) and WebSocket (base64-encoded 24 kHz PCM16) connections are supported.

Typical use cases include simultaneous interpretation, multilingual broadcasts, meetings, lessons, conference calls, and customer support. Pricing is based on audio duration (USD per minute) rather than tokens.

Classification
AudioAudio
Family: GPT
Access & deployment
API
Cloud
Weights: Closed
Key parameters
📏 Context: 16K tokens
📥 Input: audio

Technical specification

Context window
16K tokens
tokens
Max output tokens
2,000
tokens per response
Knowledge cutoff
30 Sept 2024
Knowledge boundary
Modalities
⬇ Input
audio
⬆ Output
audiotext

Capabilities and applications

Native model capabilities
Live Translation
Real-time speech translation between multiple languages without interrupting the audio stream.
Category: speech
Streaming Speech-to-Text
Real-time conversion of speech to text with immediate output as the speaker is talking.
Category: speech

Technical architecture

Core Architecture