Abstract: In the last decade, natural language processing (NLP) has gained significant interest as it helps simplify human tasks and fulfills the desire to communicate with computers using human (natural) language. Yet, Bangla abstractive text summarization remains underexplored despite its widespread use. This work introduces Bangla Web-based Summarization Dataset (\textit{BWSD}), a publicly available web-sourced Bangla summarization dataset, comprising 1,100 documents, alongside a custom preprocessing module for Bangla text processing. We propose a Bangla abstractive text summarization system, a freely available Bangla abstractive text summarization system, to evaluate BanglaT5, BanglaBERT, and mT5, with mT5 achieving the highest ROUGE-2 score (22.57). Despite challenges in data availability and linguistic complexity, our approach generates coherent, concise summaries, providing essential resources for Bangla NLP research. The dataset, the custom preprocessing module, and the system are publicly available.
Paper Type: Short
Research Area: Summarization
Research Area Keywords: Resources and Evaluation, Summarization, Generation, Human-Centered NLP, Language Modeling
Contribution Types: Data resources
Languages Studied: Bengali
Submission Number: 2892
Loading