CascadeDebate: Multi-Agent Deliberation for Cost-Aware LLM Cascades

Raeyoung Chang; Dongwook Kwon; Jisoo lee; Nikhil Verma

CascadeDebate: Multi-Agent Deliberation for Cost-Aware LLM Cascades

Raeyoung Chang, Dongwook Kwon, Jisoo lee, Nikhil Verma

Published: 18 Apr 2026, Last Modified: 22 Apr 2026ACL 2026 Industry Track PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent Systems, LLM Cascade, Multi-Agent Deliberation, Cost-Aware Inference

TL;DR: We introduce CascadeDebate, a cost-efficient LLM cascade that uses multi-agent deliberation at escalation points and learns task-specific thresholds online, reducing costs while maintaining accuracy.

Abstract: Cascaded LLM systems coordinate models of varying sizes with human experts to balance accuracy, cost, and abstention under uncertainty. However, single-model tiers at each stage falter on ambiguous queries, triggering premature escalations to costlier models or experts due to under-confidence and inefficient compute scaling. **CascadeDebate** addresses this critical gap by inserting multi-agent deliberation directly at each tier's escalation boundary. Confidence-based routers activate lightweight agent ensembles only for uncertain cases, enabling consensus-driven resolution of ambiguities internally, without invoking higher-cost upgrades. Our unified architecture alternates single-model inference with selective multi-agent deliberation across model scales, culminating in human experts as final fallback. This design scales test-time compute dynamically to query difficulty. Across five benchmarks spanning science, medicine, and general knowledge, CascadeDebate outperforms strong single-model cascades and standalone multi-agent systems by up to 26.75%. An online threshold optimizer proves essential, boosting accuracy 20.98–52.33% relative improvement over fixed policies and enabling elastic adaptation to real-world distributions.

Submission Type: Emerging

Copyright Form: pdf

Submission Number: 319

Loading