Language Level Classification on German Texts using a Neural ApproachDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Studies on language level classification (LLC) for German are scarce. Of the few existing, most use a feature-engineered approach. To the best of our knowledge, there is no deep learning approach on German texts yet. This paper shows that LLC can also be successfully applied to German texts by exploiting different pre-existing neural network architectures. Seven diverse corpora represent the data basis for training the networks: a web-scraped corpus, a corpus created from newspaper articles, three second language learner corpora, a corpus created by a company that translates complex texts into incremental simplified versions, and a corpus created from a collection of written examinations covering the whole CEFR spectrum (A1-C2). An approach based on the BERT architecture yielded the best results. The highest F1 score achieved was 1.0 and 0.83 on a document and sentence level, respectively.
0 Replies

Loading