Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning

ACL ARR 2025 May Submission4775 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Modern BPE tokenizers often split calendar dates into meaningless fragments, e.g., “20250312” $\rightarrow$ “202”, “503”, “12”, inflating token counts and obscuring the inherent structure needed for robust temporal reasoning. In this work, we (1) introduce a simple yet interpretable metric, termed date fragmentation ratio, that measures how faithfully a tokenizer preserves multi-digit date components; (2) release DateAugBench, a suite of 6500 examples spanning three temporal reasoning tasks: context-based date resolution, format-invariance puzzles, and date arithmetic across historical, contemporary, and future regimes; and (3) through layer-wise probing and causal attention-hop analyses, uncover an emergent date-abstraction mechanism whereby large language models sequentially assemble the fragments of month, day, and year components into a unified “date” concept. Our experiments show that excessive fragmentation correlates with accuracy drops of up to 10 points on uncommon dates like historical and futuristic dates. Further, we find that the larger the model, the more quickly the emergent date abstraction that heals date fragments is accomplished. Lastly, we observe a reasoning path that LLMs follow to interpret dates, relying on subword fragments that statistically represent year, month and day, and stitch these fragments in a flexible order that is subject to date formats.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: subword representations, probing, representation learning, pre-training, data augmentation, robustness, tokenization, temporal reasoning
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 4775
Loading