GreekBarBench: A Challenging Benchmark for Free-Text Legal Reasoning and Citations

GreekBarBench: A Challenging Benchmark for Free-Text Legal Reasoning and Citations

ACL ARR 2025 May Submission4247 Authors

19 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We introduce GreekBarBench, a benchmark that evaluates LLMs on legal questions across five different legal areas from the Greek Bar exams, requiring citations to statutory articles and case facts. To tackle the challenges of free-text evaluation, we propose a three-dimensional scoring system combined with an LLM-as-a-judge approach. We also develop a meta-evaluation benchmark to assess the correlation between LLM-judges and human expert evaluations, revealing that simple, span-based rubrics improve their alignment. Our systematic evaluation of $13$ proprietary and open-weight LLMs shows that even though the best models outperform average expert scores, they fall short of the 95th percentile of experts.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: legal NLP

Contribution Types: Data resources

Languages Studied: Greek

Keywords: legal NLP

Submission Number: 4247

Loading