Training
Imitation Learning / Behavior Cloning
1991ActiveUpdated: 5 May 2026Published
Key
innovation
Learning an agent policy directly from expert demonstrations without defining a reward function, eliminating reward engineering in robotics.
Category
Training
Abstraction level
Pattern
Use cases
Robotic policy trainingObject manipulationAutonomous navigationRobotic arm controlFine-tuning foundation models on human data
How it works
Pairs of (observation, action) are collected from expert demonstrations. A model (policy network) is trained to map observations to actions by minimising MSE or cross-entropy. In BC the model learns off-policy โ without environment interaction during training. In more advanced variants (DAgger) the agent queries the expert in-the-loop to correct distribution shift errors.
Problem solved
Difficulty of defining reward functions for complex robotic tasks; need for efficient skill transfer from human demonstrations.
Evolution
Original paper ยท 1991 ยท Neural Computation, 1991 ยท Dean A. Pomerleau
Efficient Training of Artificial Neural Networks for Autonomous Navigation
Dean A. Pomerleau
1991
ALVINN (Pomerleau) โ first demonstration of Behavior Cloning for autonomous navigation
Inflection point2011
DAgger (Ross et al.) โ iterative dataset aggregation solves the distribution shift problem in BC
Inflection point2022
Open-X-Embodiment โ scaling IL to millions of robotic demonstrations across diverse platforms
Inflection point2025
UnifoLM-WMA-0 applies IL/BC as Policy Enhancement on Open-X data
Technical details
Hardware requirements
Primary
Training neural network policies on large demonstration datasets requires GPUs.