Abstract: Legal Case Retrieval (LCR), which retrieves relevant cases from a query case, is a fundamental task for legal professionals in legal research and decision-making. Previous studies have focused on lexical matching or embedding-based retrieval methods, which often fail to capture detailed legal factors from complex cases. In this paper, we introduce a benchmark and a novel retrieval approach: (1) LEGAR BENCH, the first Korean LCR benchmark covering the widest range of criminal case types, supporting two dataset versions based on different relevance criteria; (2) LegalSearchLM, a generative retrieval model that can generates key legal elements from query cases with complex legal conditions through entry point-aware identifier generation. Our experiments on LEGAR BENCH show that our LegalSearchLM outperforms the most powerful baseline by 17\%, achieving state-of-the-art results. It also demonstrates remarkable out-of-domain performance across diverse criminal cases.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Legal Case Retrieval, Generative Retrieval
Contribution Types: Model analysis & interpretability
Languages Studied: Korean
Submission Number: 8247
Loading