FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text ProcessingDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: We present a benchmark suite of four datasets for evaluating the fairness of pre-trained legal language models and the techniques used to fine-tune them for downstream tasks. Our benchmarks cover four jurisdictions (European Council, USA, Swiss, and Chinese), five languages (English, German, French, Italian and Chinese) and fairness across five attributes (gender, age, nationality/region, language, and legal area). In our experiments, we evaluate pre-trained language models using several group-robust fine-tuning techniques and show that none of these combinations guarantee fairness, nor consistently mitigate group disparities. Furthermore, we analyze what causes performance differences across groups, and how group-robust fine-tuning techniques fail to mitigate group disparities under both representation inequality and temporal distribution swift.
0 Replies

Loading