Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees

Yannis Montreuil; Yeo Shu Heng; Axel Carlier; Lai Xing Ng; Wei Tsang Ooi

Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees

Yannis Montreuil, Yeo Shu Heng, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

Published: 03 Feb 2026, Last Modified: 02 May 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Improving query allocation in extractive QA by Learning-to-Defer

Abstract: Large Language Models (LLMs) excel at generative language tasks but remain unreliable for structured prediction, particularly in extractive question answering (EQA), where success depends on precise span selection. These challenges are amplified in resource-constrained environments, such as mobile or embedded systems, where deploying high-capacity models is often infeasible. We propose a Learning-to-Defer framework that routes EQA queries across a pool of models with varying capabilities and costs to balance accuracy and efficiency. Our approach is grounded in statistical decision theory: we define a differentiable surrogate loss whose minimizer provably converges to the Bayes-optimal allocation policy. Experiments on SQuADv1, SQuADv2, and TriviaQA show that our method consistently improves the accuracy-efficiency trade-off relative to static baselines and prior routing heuristics. Overall, our framework provides a principled and scalable solution for EQA in both high-performance and on-device deployment settings.

Code Dataset Promise: No

Signed Copyright Form: pdf

Format Confirmation: I agree that I have read and followed the formatting instructions for the camera ready version.

Submission Number: 1195

Loading