Keywords: temporal knowledge graphs, large language models, multi-hop reasoning
Abstract: Large Language Model (LLM)-based methods for Temporal Knowledge Graph (TKG) reasoning tasks have found success by relying on LLMs' impressive pattern recognition abilities. But the extent of this capability on complex multi-hop patterns is understudied.
In order to understand the limits of LLM-based methods' multi-hop reasoning abilities, we create a novel synthetic TKG generator and a suite of realistic TKG datasets with varied complexity along several important dimensions. In particular, we study multi-hop patterns that have been complicated by number of hops, time dispersion, and imbalanced relation and entity distributions. We benchmark the abilities of LLM- and Graph Neural Network (GNN)-based methods on these synthetic TKGs, finding that LLMs can far outperform GNN-based methods when provided with ideal contexts. However, their performance degrades sharply as contextual noise increases, indicating that retrieval, not multi-hop composition itself, is the primary bottleneck.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: corpus creation,benchmarking,automatic creation and evaluation of language resources,evaluation,neurosymbolic approaches,prompting
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: English
Submission Number: 2093
Loading