Evaluating Large Language Models for Summarizing Bangla Texts

Published: 02 Aug 2024, Last Modified: 12 Nov 2024WiNLP 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bangla Texts, Summarization, LLM, GPT, Human Evaluation
TL;DR: In this study, we used two popular Bangla news summarization datasets through the evaluation of 5 LLMs as well as human evaluation.
Abstract: Large Language Models (LLMs) for Bangla text summarization condense texts while preserving key information by leveraging advanced Natural Language Processing (NLP) techniques. In this study, we used two popular Bangla news summarization datasets through the evaluation of 5 LLMs as well as human evaluation. we made two key observations. First, we found that GPT-4 with zero-shot model settings performs well in Bangla news summarization. Secondly, previous research has been constrained by low-quality references, resulting in an underestimation of human performance and diminished few-shot capabilities. To more accurately evaluate LLMs, we performed human assessments using high-quality summaries created by student writers. Despite notable stylistic differences, including the extent of paraphrasing, LLM-generated summaries were found to be comparable to those written by humans. Our model was assessed both qualitatively and quantitatively, and comparisons with other published results showed significant improvements in human evaluation scores due to the LLM techniques.
Submission Number: 8
Loading