Not All Time Is Gregorian: Evaluating LLMs on Cultural Calendar Systems

Deepon Halder; Adish Pandya; Raj Dabre

Not All Time Is Gregorian: Evaluating LLMs on Cultural Calendar Systems

Deepon Halder, Adish Pandya, Raj Dabre

Published: 02 Mar 2026, Last Modified: 08 Mar 2026ICLR 2026 Workshop ICBINBEveryoneRevisionsCC BY 4.0

Keywords: Cultural temporal reasoning, non-Gregorian calendars, calendar conversion

TL;DR: Current state-of-the-art LLMs almost completely fail to reason or compute dates in non-Gregorian calendar systems, revealing a deep Western-centric bias in how time is represented and learned.

Abstract: Large Language Models (LLMs) demonstrate strong temporal reasoning and historical fact retrieval, yet existing benchmarks rely almost exclusively on the Gregorian calendar, implicitly treating Western temporal standards as universal. This Gregorian-centric framing obscures a critical limitation: current foundation models fail to reason reliably within culturally diverse, non-Gregorian calendar systems used by billions worldwide. We introduce a diagnostic benchmark for temporal reasoning across five major cultural calendars: **Vikram Samvat, Persian (Jalali), Hijri, Chinese Lunar, and Hebrew**. The benchmark evaluates two core capabilities: *Event Date Retrieval*, measuring factual grounding in indigenous timelines, and *Date Arithmetic*, probing structural reasoning over non-linear temporal constructs such as intercalary months and lunar cycles. Evaluating several open-weight models, including *Gemma-3*, *DeepSeek-V3*, and *Qwen-32B*, reveals pronounced performance disparities. While reasoning-optimized models such as *DeepSeek-R1* show localized competence in solar calendars (e.g., Persian), performance collapses for lunisolar and purely lunar systems. Models consistently exhibit a Gregorian anchoring effect, defaulting to linear offsets or Western mathematical heuristics even when prompted within alternative calendar frameworks. These findings expose a deep-seated Gregorian bias in foundation models, suggesting that temporal reasoning is often memorized rather than structurally learned. Our work identifies a key bottleneck in cultural alignment and provides a rigorous framework for developing more inclusive and robust temporal reasoning systems.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 25

Loading