Keywords: AI alignment, debate, LLM
Abstract: Large language models (LLMs) can produce fluent but incorrect answers, highlighting a need for methods to improve their truthfulness. Debate among AI agents has been proposed as an alignment strategy to address this challenge. In a debate framework, two LLM “debaters” argue for opposing answers to a question and a judge decides which answer is correct. However, existing debate protocols suffer from issues like agents never switching sides, and a lack of uncertainty disclosure that incentivizes overconfident bluffing. We propose an $\textbf{Uncertainty-Aware Role-Switching Debate}$ protocol to address these limitations. In our protocol, two powerful LLM debaters engage in a structured five-phase debate: they present initial answers, cross-examine each other to pinpoint errors, swap roles mid-debate to argue the opposite side, and then each explicitly report their confidence and uncertainties before a final verdict by a separate judge model. This novel debate format encourages honest self-reflection and forces each model to confront the opponent’s viewpoint. We evaluate our approach on the OpenBookQA science QA benchmark. Without any fine-tuning or external knowledge, the debate-enhanced LLM achieves 74.3\% accuracy, substantially higher than a single-model baseline. Ablation experiments confirm that both the role-switch and uncertainty-reporting phases significantly boost performance. Qualitative analyses further illustrate that our protocol helps expose deceptive arguments and guide the judge toward correct answers. Overall, our results demonstrate that incorporating uncertainty awareness and role-switching in debates can make LLMs more truthful and reliable, offering a promising new avenue for AI alignment.
Supplementary Material: zip
Submission Number: 64
Loading