HeSum: a Novel Dataset for Abstractive Text Summarization in Hebrew

Anonymous

HeSum: a Novel Dataset for Abstractive Text Summarization in Hebrew

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: While large language models (LLMs) excel in various natural language tasks for English, their performance in low-resource languages like Hebrew, especially for complex tasks like abstractive summarization, remains unclear. Hebrew's morphological richness adds further challenges due to ambiguity in sentence structure and word meaning. In this paper, we address this gap by introducing HeSum, a novel benchmark dataset specifically designed for Hebrew abstractive text summarization. HeSum comprises 10,000 article-summary pairs sourced from Hebrew news websites and written by professionals. Linguistic analysis confirms HeSum's high abstractness and unique morphological challenges. We show that HeSum presents distinct difficulties even for state-of-the-art LLMs, establishing it as a valuable testbed for advancing generative language in MRLs such as Hebrew.

Paper Type: short

Research Area: Resources and Evaluation

Contribution Types: Approaches to low-resource settings, Data resources, Data analysis

Languages Studied: Hebrew

0 Replies

Loading