TM
Natively interactive full-duplex 276B MoE model (12B active) from Thinking Machines Lab; processes audio, video and text in 200 ms micro-turns.
⏳ Preview⏳ Limited accessMultimodalAudioSpecialized AI
Parameters
276B (12B active, MoE)
parameters
Release date
11 May 2026
Access:APIDeployment:☁ Cloud
Overview
Classification
MultimodalAudioSpecialized AI
Applications
Access & deployment
API
Cloud
Weights: Closed
Key parameters
🧩 Parameters: 276B (12B active, MoE)
✓ Tools
📥 Input: text, audio, video
Technical specification
Parameters
276B (12B active, MoE)
parameters
Features:✓ Tool use
Modalities
⬇ Input
textaudiovideo
⬆ Output
textaudio
Capabilities and applications
Native model capabilities
Voice Conversation
Ability to conduct multi-turn real-time voice conversations with context retention and natural speech pacing.
Category: speech
Speech to text
Category: speech
Text to speech
Category: speech
Streaming Speech-to-Text
Real-time conversion of speech to text with immediate output as the speaker is talking.
Category: speech
Live Translation
Real-time speech translation between multiple languages without interrupting the audio stream.
Category: speech
Audio understanding
Category: audio
Video Understanding
Category: video
Multimodal understanding
Category: multimodal
Streaming output
Category: reasoning
Function Calling
Category: planning
Multilingual
Category: language
Reasoning
Category: reasoning
Benchmark results
13 benchmarks
FD-bench V1 (turn-taking latency)
latency
0.40s
📄 Thinking Machines Lab blog (May 2026)
FD-bench V1.5 (average)
average quality
77.8points
📄 Thinking Machines Lab blog (May 2026)
FD-bench V3 (Response Quality)
response quality
82.8%
📄 Thinking Machines Lab blog (May 2026)
Audio MultiChallenge APR
APR
43.4%
📄 Thinking Machines Lab blog (May 2026)
BigBench Audio
accuracy
75.7%
📄 Thinking Machines Lab blog (May 2026)
IFEval (VoiceBench)
accuracy
82.1%
📄 Thinking Machines Lab blog (May 2026)
IFEval (Text)
accuracy
89.7%
📄 Thinking Machines Lab blog (May 2026)
Harmbench
refusal rate
99.0%
📄 Thinking Machines Lab blog (May 2026)
TimeSpeak (internal)
macro accuracy
64.7%
📄 Thinking Machines Lab blog (May 2026)
CueSpeak (internal)
macro accuracy
81.7%
📄 Thinking Machines Lab blog (May 2026)
RepCount-A
off-by-one
35.4%
📄 Thinking Machines Lab blog (May 2026)
ProactiveVideoQA
PAUC@ω=0.5
33.5points
📄 Thinking Machines Lab blog (May 2026)
Charades
mIoU
32.4points
📄 Thinking Machines Lab blog (May 2026)
Technical architecture
Core Architecture
Model Form
Sources and related pages
4 sources
BlogInteraction Models: A Scalable Approach to Human-AI CollaborationWebThinking Machines wants to build an AI that actually listens while it talks (TechCrunch)WebThinking Machines shows off preview of near-realtime AI voice and video conversation (VentureBeat)WebThinking Machines drops a new, highly responsive model designed for humanlike interactions (SiliconANGLE)