AI Debate
AI Debate, proposed by Irving, Christiano, and Amodei (OpenAI, 2018), is a scalable oversight technique in which two AI agents argue opposing sides before a judge, exploiting adversarial incentives to surface errors. It is studied as a way to verify outputs beyond human expertise and to reduce sycophancy by removing the lone-assistant-pleases-judge dynamic.
AI Debate is a proposed AI alignment technique in which two AI agents argue opposing sides of a question in front of a human or model judge, with the goal of letting the judge identify the correct answer even when the judge is less capable than the debaters. It was proposed in a 2018 paper by Geoffrey Irving, Paul Christiano, and Dario Amodei at OpenAI (*AI safety via debate*) as a scalable oversight method for systems whose outputs humans cannot directly verify. The core intuition is adversarial: if one debater says something false or misleading, the other debater is incentivized to expose it, because exposing errors helps win the debate. Under idealized assumptions — honest debate is optimal in equilibrium, the judge is calibrated, and lying is harder to defend than truth-telling — the procedure can in principle scale oversight to questions beyond the judge's direct expertise. Empirical studies have tested debate on factual question-answering and on tasks where the judge has only partial information, with mixed but generally positive evidence that debate improves judge accuracy compared to a single agent or a single advocate. Debate is one of a small family of scalable-oversight proposals alongside iterated amplification, recursive reward modeling, and weak-to-strong generalization. It is also relevant to mitigating Sycophancy in LLM Responses: a debater whose only job is to win against an opponent has weaker incentives to flatter the judge than a single assistant trained on human preference signals. Known limitations include obfuscated arguments that exploit judge blind spots, collusion between debaters with shared training distributions, and the cost of running many rounds. Debate is an active research direction rather than a deployed training pipeline.