Gemini Robotics-ER 1.6
MultimodalMultimodalRobotics FMRobotics FM
Vision-Language Model by Google DeepMind with advanced spatial and embodied reasoning, designed for robotics applications.
Technical specification
Context window
Max output
Tools
Fine-tuning
Weights access
Last updated: May 2, 2026
Modalities
Input
Text
Image
Audio
Video
Output
Text
Capabilities
9Reasoning★
Reasoning
Multi-step reasoning★
Reasoning
Planning★
Planning
Image understanding★
Vision
Multimodal understanding★
Multimodality
Function Calling
Planning
Structured output★
Structured gen.
Video Understanding
Other
Audio understanding
Audio
Architecture and technologies
Core Architecture
1Form / Family
2Training Techniques
1Applications
Sources
Website1Blog1Technical report1Research paper1
