Reducing Processing Time and Enhancing Classification Performance: Shortening Strategies for German Public Contributions
Abstract: Public participation, the process of voluntary engagement of citizens in urban decision-making, requires a significant amount of time and human resources. Therefore, automatization of the evaluation is essential. Classification of the citizens’ proposals is one of the prevalent analytical tasks. Through the years many studies have worked on automatization techniques of this procedure and Natural Language Processing (NLP) methods are among the most effective ones. Nevertheless, most developed techniques despite promising results are optimized for the English language. Moreover, the NLP pre-trained models such as BERT have limitations in the length of the texts they can process. Hence, this paper focuses on the abstractive summarization of the public proposal in German and considers two different shortening techniques (truncation and summarization). The main aim is to explore how pre-trained models such as the BERT perform in the classification of summarized German language contributions. For this purpose, the German BERT model, which is fine-tuned on the MLSUM DE dataset, and the multilingual BART model are considered for text summarization. The results revealed that applying shortening techniques on long contributions reduces the model development time by an average of 48% on CPU and 36% on GPU while improving performance. Moreover, the multilingual BART model works slightly better than the BERT model fine-tuned on the MLSUM DE dataset.
Paper Type: Long
Research Area: Summarization
Research Area Keywords: Summarization
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Data analysis
Languages Studied: German
Submission Number: 400
Loading