SuperGLEBer: German Language Understanding Evaluation Benchmark

Anonymous

SuperGLEBer: German Language Understanding Evaluation Benchmark

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone

Abstract: We assemble a broad Natural Language Understanding benchmark suite for the German language and consequently evaluate a wide array of existing German-capable models in order to create a better understanding of the current state of German LLMs. Our benchmark consists of 29 different tasks ranging over different types like document classification, sequence tagging, document embedding and question answering. We evaluate 10 different German-pretrained models and thereby chart the landscape of German LLMs. In our comprehensive evaluation we find that encoder models are a good choice for most tasks, but also that the largest encoder model does not necessarily perform best for all tasks. We make our benchmark suite and a leaderboard publically available at upon-acceptance.com and encourage the community to contribute new tasks and evaluate more models on it.

Paper Type: long

Research Area: Resources and Evaluation

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources

Languages Studied: German

Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.

0 Replies

Loading