LegalSearchLM: A Generative Legal Language Model for Legal Case Retrieval

ACL ARR 2025 February Submission8247 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Legal Case Retrieval (LCR), which retrieves relevant cases from a query case, is a fundamental task for legal professionals in legal research and decision-making. Previous studies have focused on lexical matching or embedding-based retrieval methods, which often fail to capture detailed legal factors from complex cases. In this paper, we introduce a benchmark and a novel retrieval approach: (1) LEGAR BENCH, the first Korean LCR benchmark covering the widest range of criminal case types, supporting two dataset versions based on different relevance criteria; (2) LegalSearchLM, a generative retrieval model that can generates key legal elements from query cases with complex legal conditions through entry point-aware identifier generation. Our experiments on LEGAR BENCH show that our LegalSearchLM outperforms the most powerful baseline by 17\%, achieving state-of-the-art results. It also demonstrates remarkable out-of-domain performance across diverse criminal cases.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Legal Case Retrieval, Generative Retrieval
Contribution Types: Model analysis & interpretability
Languages Studied: Korean
Submission Number: 8247
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview