An Unsupervised Evolutionary Approach for Indian Regional Language Summarization

Jiten Parmar, Naveen Saini, Dhananjoy Dey

Published: 2024, Last Modified: 22 Jan 2026CEC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The news domain is an ever-evolving field, and it is more challenging to standardize text summarization for Indian low-resource languages because of the distinct syntax and semantics. It became very important to find an efficient method that could generate a concise summary. In this paper, we develop an evolutionary algorithm-based approach, namely, ILrLSUMM to generate concise extractive summaries for low-resource Indian languages, focusing on Hindi and Gujarati. To select the relevant sentences from a document to form a summary, our method employs a single-objective optimization process utilizing the efficacy of the differential evolutionary algorithm, which is a first of its kind as per knowledge. We investigate three key objectives: tf-idf score, sentence-to-title similarity, and thematic score. Our approach is purely unsupervised in nature; therefore, we utilized 500 articles from the M3LS dataset for a broader comparative analysis with the existing algorithms. Our evalu-ation was based on ROUGE scores, comparing our generated summaries with gold-standard summaries in the dataset. The results were promising in the sense that our method outperformed existing techniques, including large language models (LLMs) by 34% in Hindi and 53% in Gujarati on an average, according to the ROUGE-I Fl. This significant improvement highlights the effectiveness of our approach to handling text summarization for underrepresented languages.

External IDs:dblp:conf/cec/ParmarSD24