Self-Consistency Decoding
Decoding strategy that samples multiple reasoning paths from an LLM at non-zero temperature and aggregates them — typically by majority vote on the final answer — instead of taking a single greedy chain-of-thought. Improves accuracy on reasoning tasks and yields a cheap confidence proxy via vote share.
"Self-consistency" decoding is a technique introduced by Wang et al. (2022) that improves chain-of-thought reasoning in LLMs by sampling many independent reasoning paths at non-zero temperature and aggregating their final answers, usually by majority vote. The intuition is that a hard problem may admit many distinct correct derivations but only a small number of consistent wrong ones; sampling diversifies the reasoning, and agreement across paths concentrates on the correct answer. Self-consistency was first demonstrated on arithmetic and commonsense reasoning benchmarks such as GSM8K, where it produced double-digit accuracy gains over greedy chain-of-thought decoding on the same model. It has since become a default ingredient in reasoning pipelines, often combined with verifier models, tree-of-thought search, or weighted aggregation schemes that score paths by intermediate-step quality. Beyond accuracy, the vote share of the winning answer is a cheap, model-agnostic confidence signal usable when token probabilities are unavailable, and it ties directly into Confidence Calibration in LLM Outputs. It has known failure modes: on prompts where the model is confidently wrong, sampling tends to reproduce the same error, so vote share saturates near 100% on incorrect answers. Recent extensions — cross-model ensembles, "mirror-consistency," and self-generated distractors — try to detect these collapses by measuring semantic disagreement across heterogeneous samples rather than counting exact-match votes.