Toward Data-centric Directed Graph Learning: An Entropy-driven Approach

Xunkai Li; Zhengyu Wu; Kaichi Yu; Hongchao Qin; Guang Zeng; Rong-Hua Li; Guoren Wang

Toward Data-centric Directed Graph Learning: An Entropy-driven Approach

Xunkai Li, Zhengyu Wu, Kaichi Yu, Hongchao Qin, Guang Zeng, Rong-Hua Li, Guoren Wang

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-ND 4.0

TL;DR: The first attempt to fully utilize the potential of data to empower directed graph learning through data-centric machine learning.

Abstract: Although directed graphs (digraphs) offer strong modeling capabilities for complex topological systems, existing DiGraph Neural Networks (DiGNNs) struggle to fully capture the concealed rich structural information. This data-level limitation results in model-level sub-optimal predictive performance and underscores the necessity of further exploring the potential correlations between the directed edges (topology) and node profiles (features and labels) from a data-centric perspective, thereby empowering model-centric neural networks with stronger encoding capabilities. In this paper, we propose **E**ntropy-driven **D**igraph knowl**E**dge distillatio**N** (EDEN), which can serve as a data-centric digraph learning paradigm or a model-agnostic hot-and-plug data-centric Knowledge Distillation (KD) module. EDEN implements data-centric machine learning by constructing a coarse-grained Hierarchical Knowledge Tree (HKT) using proposed hierarchical encoding theory, and refining HKT through mutual information analysis of node profiles to guide knowledge distillation during training. As a general framework, EDEN naturally extends to undirected graphs and consistently delivers strong performance. Extensive experiments on 14 (di)graph datasets—spanning both homophily and heterophily settings—and across four downstream tasks show that EDEN achieves SOTA results and significantly enhances existing (Di)GNNs.

Lay Summary: We improve how computers learn from directed graphs — networks where connections have direction, like a one-way street. These structures, called Digraphs, are ideal for modeling real-world relationships such as information flow or social interactions, but their complexity makes them harder to process. Instead of designing more complex models, we take a data-centric approach. Our method, EDEN, focuses on improving the graph data itself by uncovering deeper patterns within its directional structure. EDEN does two key things: 1. It separates the graph’s structure from node-specific data, like features or labels. 2. It builds a tree-like structure where the original nodes sit at the bottom as leaves. From this, EDEN discovers higher-level concepts that sit above as parent nodes, capturing the hidden hierarchy within the data. This hierarchical organization reduces structural noise, removing weak or misleading links, and suggesting meaningful new ones. As a result, EDEN produces cleaner, more informative Digraphs that lead to better machine learning performance. By focusing on the data rather than model complexity, EDEN offers a powerful way to extract value from the structure of directed graphs.

Primary Area: Deep Learning->Graph Neural Networks

Keywords: Data Knowledge Distillation, Graph Neural Network, Directed Graph Learning

Submission Number: 2302

Loading