Self-Consistency
How it works
Algorithm: (1) Sample k different CoT paths using temperature T>0, (2) Extract the final answer from each path, (3) Select the answer by majority vote (most frequently occurring). A typical range is k=5-40 paths. The method requires no additional training or model modification.
Problem solved
Greedy decoding in Chain-of-Thought is sensitive to errors in a single reasoning path - one wrong step propagates to the final answer.
Implementation
Sampling k paths multiplies inference cost by k, which can be expensive for large models and long reasoning chains.
When answers are not discrete and not exact-match comparable (e.g. prose, code, longer explanations), standard majority voting is unusable.
Without sampling diversity (T = 0) all paths are identical and voting adds no information. Non-zero temperature T > 0 or top-p < 1 is required.
Evolution
Wang et al. propose majority-vote over multiple CoT paths, showing +17.9 points on GSM8K over vanilla CoT.
Follow-up work extends Self-Consistency to open-ended tasks where exact-match voting is inapplicable (Universal Self-Consistency, Chen et al., 2023).
Technical details
Hyperparameters (configurable axes)
Number of independently sampled CoT paths. Increasing k improves answer stability but linearly increases inference cost.
Temperature T controls reasoning-path diversity. T = 0 makes the method useless (no diversity).
How path outputs are combined: majority vote (classic), probability-weighted vote, or semantic clustering (Universal Self-Consistency).
Hardware requirements
Self-Consistency is a layer on top of LLM inference — agnostic to specific hardware. All calls are standard autoregressive generation, which parallelizes well on GPU and TPU.