Domain-Aware LLM Routing During Generation

Published: 2024, Last Modified: 07 Oct 2025IEEE Big Data 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Large Language Model (LLM) routing architectures allow to manage and deploy multiple fine-tuned expert models through a single inference endpoint. However, a key challenge is that individually fine-tuned models may suffer from limited generation diversity and can produce hallucinations when responses extend beyond their specialization domains. To address this, we introduce the Dynamic Expert Router, an architecture that dynamically routes token generation across expert LLMs based on domain classification. The system monitors the generated output from expert LLMs and reroutes to a different model if the response begins to shift away from the relevant domain. We present preliminary experiments, in which we identify domain shifts during generation by analyzing the response sentences of a Llama 3 model. These findings guide our research questions on optimizing embedding techniques, clustering methods, and routing mechanisms.
Loading