Abstract: General and legal domain LLMs have demonstrated strong performance in various tasks of LegalAI. However, their current evaluations lack alignment with the fundamental logic of legal reasoning, the legal syllogism. This hinders trust and understanding from legal experts. To bridge this gap, we introduce LAiW, the first Chinese legal LLM benchmark structured around the legal syllogism. We evaluate legal LLMs across three levels of capability, each reflecting a progressively more complex stage of legal syllogism: fundamental information retrieval, legal principle inference, and advanced legal applications, and encompassing a wide range of tasks in different legal scenarios. Our automatic evaluation reveals that LLMs, despite their ability to answer complex legal questions, lack the inherent logical processes of the legal syllogism. This limitation poses a barrier to acceptance by legal professionals. Furthermore, manual evaluation with legal experts confirms this issue and highlights the importance of pre-training to enhance the legal syllogism of LLMs. Future research may prioritize addressing this gap to unlock the full potential of LLMs in legal applications.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Large Language Models, Legal Syllogism, Evaluation, Benchmark
Contribution Types: Data resources
Languages Studied: English, Chinese
Submission Number: 4936
Loading