Abstractive Text Summarization for IcelandicDownload PDF

Published: 20 Mar 2023, Last Modified: 14 Apr 2023NoDaLiDa 2023Readers: Everyone
Keywords: abstractive summarization, language model, corpus
TL;DR: We fine-tune abstractive summarization models for Icelandic and observe a much better result with mT5 architecture than with a Pegasus pre-trained from scratch using gap sentence generation on monolingual Icelandic data.
Abstract: In this work, we studied methods for automatic abstractive summarization in a low-resource setting using Icelandic text, which is morphologically rich and has limited data compared to languages such as English. We collected and published the first publicly available abstractive summarization dataset for Icelandic and used it for training and evaluation of our models. We found that using multilingual pre-training in this setting led to improved performance, with the multilingual mT5 model consistently outperforming a similar model pre-trained from scratch on Icelandic text only. Additionally, we explored the use of machine translations for fine-tuning data augmentation and found that fine-tuning on the augmented data followed by fine-tuning on Icelandic data improved the results. This work highlights the importance of both high-quality training data and multilingual pre-training in achieving effective abstractive summarization in low-resource languages.
Student Paper: Yes, the first author is a student
4 Replies

Loading