Large Language Model Routing with Benchmark Datasets

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY 4.0
Research Area: Compute efficient LMs, Learning algorithms for LMs
Keywords: Model selection, LLM routing, OOD Generalization
TL;DR: We reuse benchmark evaluations to learn router models for selection of LLMs on unseen tasks.
Abstract: The number of open-source Large Language Models (LLMs) grows daily, as does the number of available benchmark datasets used to evaluate LLMs. While some models dominate these benchmarks, no single model achieves the best accuracy in all tasks and use cases. In light of this observation, we address the challenge of selecting the best LLM from a collection of pre-trained models, given a new task. While related work relies on evaluating each candidate model on a set of labeled examples, our new formulation does not assume any labeled data from the new task is available. Instead, we repurpose a collection of benchmark datasets---which may focus on different tasks than the one at hand---to learn a ''router'' model for LLM selection from inputs only; this problem reduces to a collection of binary classification tasks. Empirically, our strategy consistently improves performance over using any single model for all tasks.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 1078
Loading