Reasoning

Self-Consistency

2022ActiveUpdated: 7 May 2026Published

Key innovation

Replaced greedy decoding in Chain-of-Thought with sampling multiple diverse reasoning paths and selecting the most frequent answer, improving reasoning reliability without additional training.

How it works

Algorithm: (1) Sample k different CoT paths using temperature T>0, (2) Extract the final answer from each path, (3) Select the answer by majority vote (most frequently occurring). A typical range is k=5-40 paths. The method requires no additional training or model modification.

Problem solved

Greedy decoding in Chain-of-Thought is sensitive to errors in a single reasoning path - one wrong step propagates to the final answer.

Implementation

Reference implementations

LangChain — self-consistency parser

Python · LangChain

DSPy — multi-sample programs with voting

Python · Stanford NLP

Implementation pitfalls

Cost grows linearly with kMedium

Sampling k paths multiplies inference cost by k, which can be expensive for large models and long reasoning chains.

Fix:Choose k adaptively (e.g. early stopping when most paths already agree) or use smaller k for easier tasks.

Majority voting fails for open-ended answersMedium

When answers are not discrete and not exact-match comparable (e.g. prose, code, longer explanations), standard majority voting is unusable.

Fix:Use Universal Self-Consistency or an LLM-as-judge to aggregate semantically similar answers.

Requires non-zero temperatureLow

Without sampling diversity (T = 0) all paths are identical and voting adds no information. Non-zero temperature T > 0 or top-p < 1 is required.

Fix:Use T in the 0.5–0.7 range and verify that the generated reasoning paths actually differ.

Evolution

Original paper · 2022 · ICLR 2023 · Xuezhi Wang

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou

2022

Self-Consistency (Wang et al., ICLR 2023)

Inflection point

Wang et al. propose majority-vote over multiple CoT paths, showing +17.9 points on GSM8K over vanilla CoT.

Self-Consistency Improves Chain of Thought Reasoning in Language Models (paper)

2023

Universal Self-Consistency and extensions

Follow-up work extends Self-Consistency to open-ended tasks where exact-match voting is inapplicable (Universal Self-Consistency, Chen et al., 2023).

Universal Self-Consistency for Large Language Model Generation (paper)

Technical details

Hyperparameters (configurable axes)

Number of samples (k)Critical

Number of independently sampled CoT paths. Increasing k improves answer stability but linearly increases inference cost.

5Minimum for a noticeable improvement.

40Value used in the experiments of Wang et al. (2022).

Sampling temperatureHigh

Temperature T controls reasoning-path diversity. T = 0 makes the method useless (no diversity).

0.5–0.7Range recommended in the original paper.

Aggregation methodMedium

How path outputs are combined: majority vote (classic), probability-weighted vote, or semantic clustering (Universal Self-Consistency).

Hardware requirements

Primary

Self-Consistency is a layer on top of LLM inference — agnostic to specific hardware. All calls are standard autoregressive generation, which parallelizes well on GPU and TPU.

Sources

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Paper

arXiv / Google Research

Universal Self-Consistency for Large Language Model Generation

Paper

arXiv / Google Research