ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events

ACL ARR 2025 February Submission2084 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) still face significant challenges in reasoning and arithmetic. Although temporal reasoning has raised increasing research attention, comprehensive testing of Allen's interval relations (e.g., before, after, during) -a fundamental framework for temporal relationships- remains underexplored. To fill this gap, we present ChronoSense, a new benchmark for evaluating LLMs' temporal understanding. It includes 16 tasks, identifying the Allen relation between two temporal events and temporal arithmetic. We assess the performance of seven recent LLMs. The results indicate that models handle Allen relations, even symmetrical ones, quite differently. Moreover, the findings suggest that the models may rely on memorization to answer time-related questions. Overall, the models' low performance highlights the need for improved temporal understanding in LLMs. Our dataset and the source code are available at https://bit.ly/chronosense
Paper Type: Short
Research Area: Question Answering
Research Area Keywords: logical reasoning, reasoning
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 2084
Loading