Keywords: Multi-armed bandits, LLM recommendation
TL;DR: Our paper introduces a novel framework for efficiently selecting the most suitable Large Language Model for each query, balancing performance and cost-effectiveness through the use of contextual bandit algorithms.
Abstract: As Large Language Models (LLMs) continue to expand in both variety and cost, selecting the most appropriate model for each query is becoming increasingly crucial. Many existing works treat this as an offline problem, necessitating a data-gathering phase to compile a set of query-answer-reward triplets beforehand. They often struggle to determine the adequate number of triplets needed and are prone to overfitting if the data volume is insufficient. To address these limitations, we propose a new solution, the Multi-Armed Router (MAR), which applies multi-armed bandit theory—a perspective previously unexplored in this domain. Unlike previous works that base decision-making solely on regression techniques using static datasets (i.e., constructed triplets), our method treats this as an online multi-LLM recommendation problem, which better mirrors real-world applications. Moreover, rather than the vanilla multi-armed bandit, our framework employs contextual bandit algorithms to navigate the trade-offs between exploring new models and exploiting proven models, while considering the dependency between the input query and the answer's reward. Due to the lack of an off-the-shelf dataset in this area, we construct WildArena, a dataset of 4,029 real-world user queries. For each query, there are seven open-ended responses derived from seven leading LLMs, respectively, with an evaluation score for each answer by using the LLM-as-a-Judge framework. We hope that the introduction of the new perspective and the dataset will facilitate the research in per-query LLM routing.
Primary Area: learning on time series and dynamical systems
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3590
Loading