HeSum: a Novel Dataset for Abstractive Text Summarization in HebrewDownload PDF


16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: While large language models (LLMs) excel in various natural language tasks for English, their performance in low-resource languages like Hebrew, especially for complex tasks like abstractive summarization, remains unclear. Hebrew's morphological richness adds further challenges due to ambiguity in sentence structure and word meaning. In this paper, we address this gap by introducing HeSum, a novel benchmark dataset specifically designed for Hebrew abstractive text summarization. HeSum comprises 10,000 article-summary pairs sourced from Hebrew news websites and written by professionals. Linguistic analysis confirms HeSum's high abstractness and unique morphological challenges. We show that HeSum presents distinct difficulties even for state-of-the-art LLMs, establishing it as a valuable testbed for advancing generative language in MRLs such as Hebrew.
Paper Type: short
Research Area: Resources and Evaluation
Contribution Types: Approaches to low-resource settings, Data resources, Data analysis
Languages Studied: Hebrew
0 Replies
