LCHAIM - Investigating Long Context Reasoning in Hebrew

LCHAIM - Investigating Long Context Reasoning in Hebrew

ACL ARR 2025 February Submission2020 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Natural Language Inference (NLI) has gained significant attention recently due to its importance in understanding how machines comprehend and reason about language. While English has received tremendous interest, Morphologically Rich Languages (MRLs) like Hebrew, require more research. In this paper, we address the evaluation of Hebrew NLI models by introducing LCHAIM, a dataset designed to evaluate these models on tasks involving long premises and complex reasoning. The dataset, created by translating and validating the English ConTRoL dataset, consists of 8,325 context-hypothesis pairs that require coreferential, temporal, logical and analytical reasoning. Our experiments show the difficulty of contextual reasoning in Hebrew, as evidenced by the performance of different models. Fine-tuning the LongHero model on both the shorter premise Hebrew NLI and the LCHAIM datasets yielded a mean accuracy of 52%, that is 35% less than human performance. Similarly, Large language Models (LLMs) like Gemma-9B, Dicta-LM-2.0-7B, and GPT-4o achieved a top mean accuracy of 60.12\ in few-shot setting.

Paper Type: Long

Research Area: Semantics: Lexical and Sentence-Level

Research Area Keywords: natural language inference, morphologically rich languages, Hebrew, contextual reasoning

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources

Languages Studied: Hebrew, English

Submission Number: 2020

Loading