On Class Distributions Induced by Nearest Neighbor Graphs for Node Classification of Tabular Data

Federico Errica

On Class Distributions Induced by Nearest Neighbor Graphs for Node Classification of Tabular Data

Federico Errica

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: Deep Graph Networks, Graph Neural Networks, Graph Representation Learning, Nearest Neighbors, Node Classification, Tabular Data

TL;DR: We formally study how nearest neighbor structures, used in the past when a graph structure is not available, impact the performance of message passing neural networks.

Abstract: Researchers have used nearest neighbor graphs to transform classical machine learning problems on tabular data into node classification tasks to solve with graph representation learning methods. Such artificial structures often reflect the homophily assumption, believed to be a key factor in the performances of deep graph networks. In light of recent results demystifying these beliefs, we introduce a theoretical framework to understand the benefits of Nearest Neighbor (NN) graphs when a graph structure is missing. We formally analyze the Cross-Class Neighborhood Similarity (CCNS), used to empirically evaluate the usefulness of structures, in the context of nearest neighbor graphs. Moreover, we study the class separability induced by deep graph networks on a k-NN graph. Motivated by the theory, our quantitative experiments demonstrate that, under full supervision, employing a k-NN graph offers no benefits compared to a structure-agnostic baseline. Qualitative analyses suggest that our framework is good at estimating the CCNS and hint at k-NN graphs never being useful for such classification tasks under full supervision, thus advocating for the study of alternative graph construction techniques in combination with deep graph networks.

Supplementary Material: zip

Submission Number: 2175

Loading