LawBench: Benchmarking Legal Knowledge of Large Language Models

Published: 01 Jan 2024, Last Modified: 04 Mar 2025EMNLP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We present LawBench, the first evaluation benchmark composed of 20 tasks aimed to assess the ability of Large Language Models (LLMs) to perform Chinese legal-related tasks. LawBench is meticulously crafted to enable precise assessment of LLMs’ legal capabilities from three cognitive levels that correspond to the widely accepted Bloom’s cognitive taxonomy. Using LawBench, we present a comprehensive evaluation of 21 popular LLMs and the first comparative analysis of the empirical results in order to reveal their relative strengths and weaknesses. All data, model predictions and evaluation code are accessible from https://github.com/open-compass/LawBench.
Loading