Robotics

Dexterous Manipulation

2019ActivePublished: 16 June 2026Updated: 16 June 2026Published

Key innovation

Shifting dexterous object manipulation from simple grippers to multi-fingered end-effectors driven by learned neural policies — combining visual and tactile perception.

How it works

A modern dexterous-manipulation pipeline consists of several layers. (1) Perception: RGB-D cameras, depth sensors, and tactile sensors on fingertips (e.g. GelSight, DIGIT) provide observations of pose and contact. (2) State representation: a VLA model or a separate encoder (CNN, ViT, point-cloud network) compresses raw sensor data into a compact state vector. (3) Policy: a neural network (MLP, transformer, diffusion policy) produces an action vector each step — typically joint targets for all hand DoFs. (4) Training: imitation learning from teleoperated demonstrations, RL in simulation (Isaac Gym, MuJoCo) with domain randomization for sim-to-real transfer, or a hybrid IL+RL approach (residual policies). (5) Execution: a low-level controller (joint-impedance, operational-space control) converts actions into actuator torques at 100–1000 Hz.

Problem solved

Classical two-finger grippers fail at objects with complex shapes, at manipulation requiring re-orientation in-hand, and at delicate operations with force control. Dexterous Manipulation addresses the problem of general, adaptive object manipulation in unstructured environments — a prerequisite for humanoids, home robots, and advanced industrial automation.

Components

Multi-fingered end-effectorActuation — translates the policy's action vector into physical motion.

A mechanical hand with many degrees of freedom (Shadow Hand: 24 DoF, Allegro: 16, Inspire: 12, robotic hands of Tesla/Figure humanoids: 11–17 DoF). The physical interface between the policy and the world.

Official

Tactile sensingClosing the control loop on contact information.

Fingertip sensors (GelSight, DIGIT, ReSkin, piezoresistive arrays) measure contact force, slip and local surface geometry. Critical for tasks that require force control.

Official

Manipulation policyDecision-making core — produces the action vector at each step.

A neural network mapping observations to actions. Modern variants: MLP/transformer for RL tasks, diffusion policy for imitation learning, VLA models for tasks requiring natural-language reasoning.

Official

Sim-to-real transferBridge between cheap, parallel simulated training and costly physical execution.

A mechanism that lets policies trained in simulation work on a physical robot. Typically domain randomization (sampling friction, masses, latencies), domain adaptation, residual policy, or fine-tuning on a small number of real-world demonstrations.

Official

Implementation

Reference implementations

OpenAI Dactyl (Rubik's Cube)

ALOHA (Stanford / Mobile ALOHA)

Implementation pitfalls

Sim-to-real gapHigh

Policies that work perfectly in simulation can fail completely on a physical robot due to differences in friction, actuator latency, sensor noise, and contact dynamics.

Fix:Domain randomization (sampling dynamics parameters during training), domain adaptation, residual policies fine-tuned on real data, system identification.

Reward hacking in RLMedium

The policy finds unexpected ways to maximise reward (e.g. spinning a cube by finger vibration instead of coordinated grasping).

Fix:Carefully shaped reward function, curriculum learning, combining imitation learning with RL as warm-start.

Missing tactile data in simulationMedium

Most simulators do not model tactile sensors with sufficient fidelity — preventing pure-sim learning of force-controlled tasks.

Fix:Use simulators with tactile models (TACTO, Taxim), collect real-world demonstrations with tactile sensors and use imitation learning, sim+real residual learning.

Evolution

Original paper · 2019 · arXiv 2019 (OpenAI) · OpenAI et al.

Solving Rubik's Cube with a Robot Hand

OpenAI et al.

1985

Classical theoretical foundations

Foundational work by Mason, Salisbury and Bicchi on form- and force-closure grasps, hand kinematics, and grasp analysis.

1997

Shadow Dexterous Hand

First commercial 24-DoF anthropomorphic multi-fingered end-effector (Shadow Robot Company) — the de facto standard for dexterous-manipulation research.

2017

Dex-Net (Berkeley) — analytic + learned grasping

Ken Goldberg's group showed that deep learning on synthetic grasp datasets transfers effectively to physical robots — a foundation for modern learned grasping.

2019

OpenAI Dactyl solves Rubik's Cube

Inflection point

A neural policy trained in massively parallel simulation with domain randomization solved a Rubik's cube with a Shadow Hand — the first spectacular RL success in dexterous manipulation.

Solving Rubik's Cube with a Robot Hand (paper)

2022

Isaac Gym — massively parallel simulation

NVIDIA released GPU-native simulation with thousands of parallel environments; training dexterous-manipulation policies dropped from days to hours.

2024

ALOHA, DexCap and large-scale imitation learning

Inflection point

Teams at Stanford and Berkeley showed that low-cost teleoperation rigs (ALOHA, DexCap) collecting demonstrations enable effective imitation-learning policies without RL — an alternative path to RL+sim-to-real.

2024

VLA models (RT-2, π0, GR00T) for manipulation

Inflection point

Vision-Language-Action models with billions of parameters, trained on massive robotic-demonstration corpora, started dominating dexterous manipulation by integrating natural-language understanding with policy generation.