Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees
TL;DR: Improving query allocation in extractive QA by Learning-to-Defer
Abstract: Large Language Models (LLMs) excel at generative language tasks but remain unreliable for structured prediction—particularly in extractive question answering (EQA), where success hinges on precise span selection. These challenges are magnified in resource-constrained environments, such as mobile or embedded systems, where deploying high-capacity models is often infeasible. We propose a \textit{Learning-to-Defer} framework that routes EQA queries across a pool of models with varying capabilities and costs, balancing accuracy against efficiency. Our approach is grounded in statistical decision theory: we define a differentiable surrogate loss whose minimizer provably converges to the Bayes-optimal allocation policy. Experiments on SQuADv1, SQuADv2, and TriviaQA show that our method consistently improves accuracy–efficiency trade-offs relative to static baselines and prior routing heuristics. Our work provides a principled and scalable solution for EQA in both high-performance and on-device deployment settings.
Submission Number: 1195
Loading