MIRT: Multi-Dimensional IRT for SLO-Adaptive Multi-Agent Routing

Hak Hyun Kim; Benjamin Huh; Jimin Moon; Chia-Wei Lee; Jason Peng; Wesley J. Marrero; Soroush Vosoughi

MIRT: Multi-Dimensional IRT for SLO-Adaptive Multi-Agent Routing

Hak Hyun Kim, Benjamin Huh, Jimin Moon, Chia-Wei Lee, Jason Peng, Wesley J. Marrero, Soroush Vosoughi

Published: 25 May 2026, Last Modified: 27 May 2026DEMO 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Offline-to-Online Adaptation, Multi-Agent Routing, Item Response Theory, Lagrangian Constraint Optimization, Cross-Constraint Transfer

Abstract: Multi-agent LLM routing policies are typically learned offline from cached query-action outcomes, yet must adapt online to shifting cost and latency constraints without retraining. We formalize this offline-to-online adaptation problem and approach it through Item Response Theory (IRT), decomposing each query-action outcome into latent ability and difficulty factors that are learned entirely offline and independently of constraint thresholds. We identify two structural pitfalls: one-dimensional IRT provably reduces to static action selection regardless of query difficulty, and end-to-end training collapses routing diversity. Our method, Multi-dimensional IRT (MIRT), resolves both via $D$-dimensional latent factors and two-stage decomposition. Because the learned factors are constraint-independent, online adaptation reduces to recalibrating only two Lagrangian dual variables, enabling a single offline-trained model to serve shifting SLO regimes. MIRT outperforms the best parametric baseline by +3.9 pp F1 (0.797 vs. 0.758), maintains stable performance across regimes (F1 0.796-0.800), and is the only method keeping both cost and latency violations below the 5% target across all three regimes.

Submission Number: 64

Loading