Training

Imitation Learning / Behavior Cloning

1991ActiveUpdated: 5 May 2026Published

Key innovation

Learning an agent policy directly from expert demonstrations without defining a reward function, eliminating reward engineering in robotics.

How it works

Pairs of (observation, action) are collected from expert demonstrations. A model (policy network) is trained to map observations to actions by minimising MSE or cross-entropy. In BC the model learns off-policy — without environment interaction during training. In more advanced variants (DAgger) the agent queries the expert in-the-loop to correct distribution shift errors.

Problem solved

Difficulty of defining reward functions for complex robotic tasks; need for efficient skill transfer from human demonstrations.

Evolution

Original paper · 1991 · Neural Computation, 1991 · Dean A. Pomerleau

Efficient Training of Artificial Neural Networks for Autonomous Navigation

Dean A. Pomerleau

1991

ALVINN (Pomerleau) — first demonstration of Behavior Cloning for autonomous navigation

Inflection point

2011

DAgger (Ross et al.) — iterative dataset aggregation solves the distribution shift problem in BC

Inflection point

2022

Open-X-Embodiment — scaling IL to millions of robotic demonstrations across diverse platforms

Inflection point

2025

UnifoLM-WMA-0 applies IL/BC as Policy Enhancement on Open-X data

Technical details

Hardware requirements

Primary

Training neural network policies on large demonstration datasets requires GPUs.