Evaluating LLMs for Portuguese Sentence Simplification with Linguistic Insights

ACL ARR 2025 February Submission5798 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Sentence simplification (SS) focuses on adapting sentences to enhance their readability and accessibility. While large language models (LLMs) match task-specific baselines in English SS, their performance in Portuguese remains underexplored. This paper presents a comprehensive performance comparison of 26 state-of-the-art LLMs in Portuguese SS, alongside two simplification models trained explicitly for this task and language. They are evaluated under a one-shot setting across scientific, news, and government datasets. We benchmark the models with our newly introduced Gov-Lang-BR corpus (1,703 complex-simple sentence pairs from Brazilian government agencies) and two established datasets: PorSimplesSent and Museum-PT. Our investigation takes advantage of both automatic metrics and large-scale linguistic analysis to examine the transformations achieved by the LLMs. Furthermore, a qualitative assessment of selected generated outputs provides deeper insights into simplification quality. Our findings reveal that while open-source LLMs have achieved impressive results, closed-source LLMs continue outperforming them in Portuguese SS.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: text simplification, NLG, low-resource language and models, large language models, Portuguese sentence simplification
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: Portuguese
Submission Number: 5798
Loading