Keywords: Large Language Models (LLMs), Legal Question Answering, Human–AI Collaboration, Two-Stage Router
Paper Type: Short papers / work-in-progress
TL;DR: The paper proposes a Chain-of-Thought (CoT)-Guided Two-Stage Routing Framework to optimize resource allocation in legal QA.
Abstract: Legal question-answering systems powered by Large Language Models can significantly enhance the efficiency and accessibility of legal services. However, their practical deployment is hindered by prohibitive computational costs and the risk of generating unreliable advice, leading to resource misallocation and safety concerns. To address this, model routing is essential, but generic routing solutions fail to meet the stringent demands of the legal domain. In the paper, we propose a Chain-of-Thought (CoT)-Guided Two-Stage Routing Framework to optimize resource allocation in legal QA. Our framework consists of three modules: (1) an LLM fine-tuned with Group Relative Policy Optimization (GRPO) to generate high-quality CoTs as routing features; (2) a human–machine gate that decides whether to defer a query to a human expert or answer automatically; and (3) a contextual-bandit selector that maximizes expected net utility, trading off predicted answer quality against inference cost. Experimental results demonstrate the effectiveness of our proposed framework.
Poster PDF: pdf
Submission Number: 55
Loading