Abstract: Large Language Models (LLMs) have shown exceptional reasoning capabilities, yet selecting the most reliable response from multiple LLMs remains a challenge, especially in resource-constrained settings. Existing approaches often rely on expensive external verifiers, human evaluators, or self-consistency techniques that require multiple samples from a single model. Multi-LLM debate provides a more interactive mechanism, yet it frequently underperforms compared to self-consistency with the best LLM. In this work, we introduce a log-likelihood-based selection framework to enhance reasoning in multi-LLM debate settings. Our approach leverages uncertainty estimation to identify the most confident response while minimizing inference costs. We demonstrate that our method outperforms majority vote selection and surpasses self-consistency performance for a large number of model calls. Through extensive experiments, we show that multi-LLM collaboration—when guided by uncertainty-aware selection—lead to improvement of 6.19\% for settings with less number of model calls.
Paper Type: Short
Research Area: Language Modeling
Research Area Keywords: Multi-LLM Debate, Uncertainty, Log-Likelihood estimation.
Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency
Languages Studied: N/A
Submission Number: 6152
Loading