AuthBench: A Large-Scale Multilingual Benchmark for Authorship Representation across Genres and Lengths
Keywords: Authorship Representation, Authorship Attribution, Benchmark, Dataset
Abstract: We introduce AuthBench, a large-scale multilingual benchmark for Authorship Representation, enabling evaluation of authorship Attribution and Verification across numerous genres and lengths. AuthBench contains 293,029 documents of diverse lengths (short, medium, long, and extra-long), authored by 53,281 individuals across ten widely used languages (en, zh, hi, es, fr, ar, ru, de, ja, ko), spanning 9 primary genres with 62 fine-grained subgenres. AuthBench supports two complementary evaluations: (i) authorship attribution, operationalized as same-author document retrieval, scored by Success@K, Recall@K, and nDCG@K; and (ii) authorship verification, i.e., same-author binary decisions over query--candidate pairs, scored by equal error rate (EER). We comprehensively evaluate state-of-the-art (SOTA) instruction-tuned and embedding models on AuthBench. Experiments with SOTA models show that performance remains far from saturated: the best Success@5 reaches only 0.542, and the best overall EER is 0.078. Moreover, SOTA models exhibit substantial performance variation across languages, document lengths, and genres, highlighting persistent challenges for robust authorship modeling. We release AuthBench along with its evaluation toolkit anonymously at https://anonymous.4open.science/r/AURA_Bench-366E/README.md.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking, multilingual corpora, automatic creation and evaluation of language resources, NLP datasets
Contribution Types: NLP engineering experiment, Data resources, Data analysis
Languages Studied: English, Chinese, Hindi, Spanish, French, Arabic, Russian, German, Japanese, Korea
Submission Number: 442
Loading