Foundation Model
How it works
1) Pretraining: the model learns general representations on a very large, diverse corpus, typically via self-supervised objectives (e.g., next-token prediction, masked language modeling, contrastive learning). 2) Adaptation: the same model is adapted to specific tasks via fine-tuning, instruction tuning, RLHF, prompting, or parameter-efficient adapters (LoRA). Scaling parameters, data, and compute is associated with 'emergent capabilities' โ abilities not observed in smaller models.
Problem solved
Removes the need to train a separate model from scratch for each task โ one large, general model adapts to many applications at low marginal cost.
Evolution
BERT (Google) and GPT (OpenAI) established the 'pretrain-then-adapt' paradigm as the NLP standard.
GPT-3 demonstrated that scale gives rise to few-shot capabilities without task-specific fine-tuning.
Bommasani et al. formalize the paradigm and introduce the name.
Extension of the paradigm beyond text โ image, video, audio.
Google DeepMind brings the paradigm to robotics by combining VLM with manipulation.
Open-weight models become competitive with closed counterparts.