RUATS: Abstractive Text Summarization for Roman Urdu

Laraib Kaleem, Arif Ur Rahman, Momina Moetesum

Published: 01 Jan 2024, Last Modified: 29 Oct 2025CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Recent advances in text summarization primarily target high resource languages. However, their performance on low resource and unstructured languages like Roman Urdu (RU) is not yet evaluated. This research evaluates abstractive summarization of Roman Urdu text commonly used while communicating via social media in Urdu speaking communities. Due to scarcity of relevant datasets, a corpus of Roman Urdu text is generated by transliterating samples collected from two benchmark Urdu abstractive text summarization datasets. Baseline summaries are then generated using two state-of-the-art (SOTA) transformer-based models Bidirectional Encoder Representations from Transformers (BERT) and Text-To-Text Transfer Transformer (T5). The summaries generated by both models are evaluated using different intrinsic and extrinsic methods. Results of the experiments show that T5 outperforms BERT in generating abstractive summaries of Roman Urdu text. Nonetheless, there is more research required in this direction.

External IDs:doi:10.1007/978-3-031-70442-0_16