Rethinking Evaluation Strategy for Temporal Link Prediction through Counterfactual Analysis

09 May 2024 (modified: 13 Nov 2024)Submitted to NeurIPS 2024 Track Datasets and BenchmarksEveryoneRevisionsBibTeXCC BY 4.0
Keywords: temporal link prediction, dynamic graphs, temporal graphs, evaluation, counterfactual, causality
TL;DR: What if a Temporal Link Prediction model is tested on a temporally distorted version of the data instead of the real data?
Abstract: In response to critiques of existing evaluation methods for Temporal Link Prediction (TLP) models, we propose a novel approach to verify if these models truly capture temporal patterns in the data. Our method involves a sanity check formulated as a counterfactual question: ``What if a TLP model is tested on a temporally distorted version of the data instead of the real data?'' Ideally, a TLP model that effectively learns temporal patterns should perform worse on temporally distorted data compared to real data. We provide an in-depth analysis of this hypothesis and introduce two data distortion techniques to assess well-known TLP models. Our contributions are threefold: (1) We introduce simple techniques to distort temporal patterns within a graph, generating temporally distorted test splits of well-known datasets for sanity checks. These distortion methods are applicable to any temporal graph dataset. (2) We perform counterfactual analysis on TLP models such as JODIE, TGAT, TGN, and CAWN to evaluate their capability in capturing temporal patterns across different datasets. (3) We propose an alternative evaluation strategy for TLP, addressing the limitations of binary classification and ranking methods, and introduce two metrics -- average time difference (ATD) and average count difference (ACD) -- to provide a comprehensive measure of a model's predictive performance. The code and datasets are available at: https://github.com/Aniq55/TLPCF.git
Submission Number: 191
Loading