Making Sense of Korean Sentences: A Comprehensive Evaluation of LLMs through KoSEnd Dataset

ACL ARR 2024 December Submission2127 Authors

16 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Although LLMs have made significant progress in handling various languages, there are still concerns about their effectiveness with low-resource agglutinative languages compared to languages such as English. In this study, we focused on Korean, a language known for its complex sentence endings, and evaluated LLMs on this challenging aspect. We introduce the Korean Sentence Endings (KoSEnd) dataset, which includes 3,000 sentences and 45,000 sentence ending labels. These were collected from diverse sources to cover a wide range of contexts. We evaluated 11 models to assess their understanding of Korean sentence endings, analyzing them based on parameter count and prediction consistency. Notably, we observed that informing models about the possibility of missing sentence endings led to improved performance, demonstrating the influence of explicitly considering certain linguistic features.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Sentence Ending, Agglutinative Language, LLM, Korean
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: Korean
Submission Number: 2127
Loading