Low Dimensional Embeddings for Model Capability Understanding

Published: 01 Jun 2026, Last Modified: 11 Jun 2026AdaptFM PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Efficient Model Representation, Query Routing, Attention-based Encoder
TL;DR: We present LOCUS, an attention based method which produces low-dimensional vector embeddings that compactly represent a language model's capability across queries.
Abstract: The rapidly growing ecosystem of Large Language Models (LLMs) makes it increasingly difficult to manage and utilize the expanding model pool. We propose LOCUS, an attention-based method that produces low-dimensional embeddings capturing a model's capabilities across queries. LOCUS deterministically generates embeddings from query encodings and evaluation scores via a forward pass, enabling new models to be added and existing embeddings to be refined without retraining. A correctness predictor built on these embeddings achieves state-of-the-art routing accuracy on unseen queries. Experiments show that LOCUS requires up to 4.8x fewer query evaluations than baselines while producing robust, geometrically meaningful embeddings whose proximity reflects model similarity, supporting model comparison, clustering, portfolio selection, and proxying unavailable models.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 155
Loading