ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events

ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events

ACL ARR 2025 February Submission2084 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) still face significant challenges in reasoning and arithmetic. Although temporal reasoning has raised increasing research attention, comprehensive testing of Allen's interval relations (e.g., before, after, during) -a fundamental framework for temporal relationships- remains underexplored. To fill this gap, we present ChronoSense, a new benchmark for evaluating LLMs' temporal understanding. It includes 16 tasks, identifying the Allen relation between two temporal events and temporal arithmetic. We assess the performance of seven recent LLMs. The results indicate that models handle Allen relations, even symmetrical ones, quite differently. Moreover, the findings suggest that the models may rely on memorization to answer time-related questions. Overall, the models' low performance highlights the need for improved temporal understanding in LLMs. Our dataset and the source code are available at https://bit.ly/chronosense

Paper Type: Short

Research Area: Question Answering

Research Area Keywords: logical reasoning, reasoning

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 2084

Loading