Robots Atlas>ROBOTS ATLAS
Training

SFT

2022ActiveUpdated: 6 May 2026Published
Key innovation
Enabled adaptation of large pre-trained language models to specific tasks and instruction-following behavior using relatively small, labeled datasets of demonstrations.
Category
Training
Abstraction level
Pattern
Operation level
TrainingPost-training
Use cases
Chatbots and language assistantsInstruction model fine-tuningFirst stage of RLHFDomain specialization of models

How it works

The SFT dataset contains (prompt p, response y) pairs. The loss is L = -sum log P(y_t | p, y_<t). The model is trained with gradient descent on these pairs, typically with a small learning rate. Techniques like LoRA or QLoRA are often used to reduce compute costs. Data may come from human annotators (e.g. FLAN, Dolly) or be synthetically generated by a stronger model.

Problem solved

Pre-trained models are good at text completion but not at following user instructions, answering questions in chat format, or generating safe and helpful responses.

Implementation

Implementation pitfalls
Catastrophic forgettingHigh

SFT on a narrow dataset can cause the model to forget previously learned capabilities. Use diverse datasets or regularization.

Overfitting on small SFT datasetMedium

With too few examples or too many epochs, the model memorizes demonstrations rather than generalizing.

Data quality is criticalHigh

Noisy, inconsistent, or biased SFT data is directly reflected in model behavior. Quality > quantity.

Evolution

Original paper ยท 2022 ยท NeurIPS 2022 ยท Long Ouyang
Training language models to follow instructions with human feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray
2019
Pre-training + fine-tuning paradigm (GPT-1, BERT)
Inflection point

Radford et al. and Devlin et al. establish the pre-train/fine-tune paradigm.

2021
FLAN - SFT on instruction datasets

Wei et al. show that fine-tuning on diverse instruction datasets improves zero-shot performance.

2022
InstructGPT - SFT as stage 1 of RLHF
Inflection point

Ouyang et al. formalize SFT as the first step before reward modeling and PPO.

2023
LoRA and QLoRA - efficient SFT

Hu et al. (LoRA) and Dettmers et al. (QLoRA) enable SFT on consumer hardware by training only low-rank adapters.