Gemini Robotics 1.5
MultimodalMultimodalRobotics FMRobotics FMVLAVLA
Vision-Language-Action (VLA) model by Google DeepMind that converts visual inputs and language instructions into motor commands for robots.
Technical specification
Context window
Tools
Fine-tuning
Weights access
Last updated: May 2, 2026
Modalities
Input
Text
Image
Output
Text
action
Capabilities
6Reasoning★
Reasoning
Multi-step reasoning★
Reasoning
Planning★
Planning
Image understanding★
Vision
Multimodal understanding★
Multimodality
Multilingual★
Language
