Enhancing temporal commonsense understanding using disentangled attention-based method with a hybrid data framework

Wasif Feroze, Muhammad Shahid, Shaohuan Cheng, Elias Lemuye Jimale, Yi Yang, Hong Qu, Yulin Wang

Published: 18 Mar 2025, Last Modified: 27 Aug 2025Intelligence & RoboticsEveryoneCC BY 4.0

Abstract: Understanding and capturing temporal relationships between time-related events expressed in text is a crucial aspect of natural language understanding (NLU). Although transformer-based pre-trained language models such as bidirectional encoder representations from transformers (BERT) have achieved significant success in various natural language processing (NLP) tasks, they are still believed to underperform in temporal commonsense tasks due to the limitation of vanilla self-attention. This paper proposes a methodology for developing language models to understand temporal commonsense reasoning over several tasks better. The proposed framework integrates a multi-data hybrid curation approach for dataset preparation, a collaborative synthetic dataset generation process involving chat agents and human domain experts, and a multi-stage fine-tuning strategy that leverages curated, intermediate, and target datasets to enhance temporal commonsense reasoning capabilities. The models we use in our proposed methodology are superior due to the use of an advanced attention mechanism and effective utilization of our framework. These models utilize disentangled attention, which is relative encoding position, which proved crucial for temporal commonsense by understanding temporal cues and indicators efficiently. Our extensive experiments show that models built with our proposed methodology enhance results on several temporal commonsense categories. Our results show that we achieved better performance than the previous published work by utilizing a disentangled attention mechanism and hybrid data framework. Most impressively, our approach has demonstrated state-of-the-art (SOTA) results, surpassing all previous studies on temporal commonsense for the MC-TACO dataset.