Robots Atlas>ROBOTS ATLAS
Reasoning

Self-Consistency

2022ActiveUpdated: 7 May 2026Published
Key innovation
Replaced greedy decoding in Chain-of-Thought with sampling multiple diverse reasoning paths and selecting the most frequent answer, improving reasoning reliability without additional training.
Category
Reasoning
Abstraction level
Pattern
Operation level
Inference
Use cases
Arithmetic and math tasksLogical reasoningOpen-ended multi-step reasoning questionsLLM answer correctness verification

How it works

Algorithm: (1) Sample k different CoT paths using temperature T>0, (2) Extract the final answer from each path, (3) Select the answer by majority vote (most frequently occurring). A typical range is k=5-40 paths. The method requires no additional training or model modification.

Problem solved

Greedy decoding in Chain-of-Thought is sensitive to errors in a single reasoning path - one wrong step propagates to the final answer.

Implementation

Implementation pitfalls
Cost grows linearly with kMedium

Sampling k paths multiplies inference cost by k, which can be expensive for large models and long reasoning chains.

Fix:Choose k adaptively (e.g. early stopping when most paths already agree) or use smaller k for easier tasks.
Majority voting fails for open-ended answersMedium

When answers are not discrete and not exact-match comparable (e.g. prose, code, longer explanations), standard majority voting is unusable.

Fix:Use Universal Self-Consistency or an LLM-as-judge to aggregate semantically similar answers.
Requires non-zero temperatureLow

Without sampling diversity (T = 0) all paths are identical and voting adds no information. Non-zero temperature T > 0 or top-p < 1 is required.

Fix:Use T in the 0.5–0.7 range and verify that the generated reasoning paths actually differ.

Evolution

Original paper · 2022 · ICLR 2023 · Xuezhi Wang
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou
2022
Self-Consistency (Wang et al., ICLR 2023)
Inflection point

Wang et al. propose majority-vote over multiple CoT paths, showing +17.9 points on GSM8K over vanilla CoT.

2023
Universal Self-Consistency and extensions

Follow-up work extends Self-Consistency to open-ended tasks where exact-match voting is inapplicable (Universal Self-Consistency, Chen et al., 2023).

Technical details

Hyperparameters (configurable axes)

Number of samples (k)Critical

Number of independently sampled CoT paths. Increasing k improves answer stability but linearly increases inference cost.

5Minimum for a noticeable improvement.
40Value used in the experiments of Wang et al. (2022).
Sampling temperatureHigh

Temperature T controls reasoning-path diversity. T = 0 makes the method useless (no diversity).

0.5–0.7Range recommended in the original paper.
Aggregation methodMedium

How path outputs are combined: majority vote (classic), probability-weighted vote, or semantic clustering (Universal Self-Consistency).

Hardware requirements

Primary

Self-Consistency is a layer on top of LLM inference — agnostic to specific hardware. All calls are standard autoregressive generation, which parallelizes well on GPU and TPU.