CoLLaM: A Comprehensive Benchmark for Evaluating Large Language Models in Legal Domain

Anonymous

CoLLaM: A Comprehensive Benchmark for Evaluating Large Language Models in Legal Domain

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Large language models (LLMs) have made significant progress in natural language processing tasks and have shown considerable potential in the legal domain. However, the legal applications often have high requirements on accuracy, reliability and fairness. Applying existing LLMs to legal systems without careful evaluation of their potentials and limitations could lead to significant risks in legal practice.Therefore, to facilitate the healthy development and application of LLMs in the legal domain, we propose a comprehensive benchmark CoLLaM for evaluating LLMs in legal domain.Specifically, CoLLaM is developed based on the language abilities of LLMs and the practical requirements of the legal domain. It introduces a new legal cognitive ability taxonomy (LCAT) featuring six distinctive levels: Memorization, Understanding, Logic Inference, Discrimination, Generation, and Ethic. Leveraging this taxonomy, we collected 13,650 questions across 23 tasks and evaluated them against 38 open-source and commercial LLMs. Our experimental results led to interesting findings and indicate that applying LLMs in the legal domain still has a long way to go. The details of CoLLaM can be found on the anonymous website \url{https://anonymous.4open.science/r/CoLLaM-31F2}.

Paper Type: long

Research Area: Resources and Evaluation

Contribution Types: Data resources

Languages Studied: Chinese, English

0 Replies

Loading