Revealing and Pushing Legal Reasoning Boundary of Large Language Models

Revealing and Pushing Legal Reasoning Boundary of Large Language Models

ICLR 2026 Conference Submission17582 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Chain-of-Thought, Reasoning Boundary

TL;DR: We propose a simple but effective method to improve the legal reasoning boundary of LLMs.

Abstract: Large language models (LLMs) have made significant strides in legal Domain, such as recommending charge based on the given criminal fact. However, existing LLMs struggle to address the long-tail problem in the legal domain. This is primarily due to the imbalanced distribution of legal data during pre-training, resulting in varied inference capabilities, especially those in the long-tail category. Moreover, common methods for enhancing LLMs’ reasoning abilities, such as the chain-of-thought series and retrieval-augmented generation series, also fail to address the long-tail charge prediction problem. In this work, we reveal that solving this issue requires providing LLMs with their deficient legal knowledge, particularly regarding easily confused and tailed legal concepts. We propose a legal knowledge retrieval method (denoted as KnowTail) that includes a Knowledge Inspector to identify their knowledge gaps and a Knowledge Integrator to provide tailored legal knowledge for accurate legal reasoning. Extensive experiments on real-world datasets demonstrate that our method significantly surpasses prior state-of-the-art methods, e.g., achieving average F1 gains of 22.91% for overall charges and 22.94% for tail charges on GPT-4o, and gains of 26.74% (overall) and 27.16% (tail) on QWen3- 8B. Our codes and data are available at Github: https://anonymous.4open.science/r/KnowTail-F860.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 17582

Loading