Information-Theoretic Legal Issue Identification and Reward Modeling on Court Cases

ACL ARR 2025 February Submission8327 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Legal issue identification is a crucial first step in legal analysis, yet more than half of people worldwide struggle to meet their civil justice needs. While Large Language Models (LLMs) have shown promise in various application domains, their effectiveness in identifying legal issues from real-world court cases remains understudied. Previous evaluations have been limited to simplified scenarios or textbook examples, lacking the complexity of actual cases. To address this gap, we present LIC, a dataset of 769 real-world court cases pertinent to Contract Act Malaysia, with facts and legal issues extracted using GPT-4o and validated by top law students and junior lawyers. We propose a novel approach that generates and ranks legal issue candidates by incrementally incorporating case facts and propose a novel reward model based on mutual information (MI) for reranking. Our method uses a soft-threshold function to align MI with estimated relevance between issue candidates and facts during training. Experimental results demonstrate our methodology's superior performance compared to baselines on our test set. This work advances automated legal issue identification while providing a substantial dataset for future research in legal AI. Our dataset and the source code will be publicly available upon acceptance.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: legal NLP, NLP datasets
Contribution Types: Data resources
Languages Studied: English
Submission Number: 8327
Loading