Gradient Scarcity in Graph Learning with Bilevel Optimization

Published: 14 Jun 2024, Last Modified: 14 Jun 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Gradient scarcity emerges when learning graphs by minimizing a loss on a subset of nodes under the semi-supervised setting. It consists in edges between unlabeled nodes that are far from the labeled ones receiving zero gradients. The phenomenon was first described when jointly optimizing the graph and the parameters of a shallow Graph Neural Network (GNN) using a single loss function. In this work, we give a precise mathematical characterization of this phenomenon, and prove that it also emerges in bilevel optimization. While for GNNs gradient scarcity occurs due to their finite receptive field, we show that it also occurs with the Laplacian regularization as gradients decrease exponentially in amplitude with distance to labeled nodes, despite the infinite receptive field of this model. We study several solutions to this issue including latent graph learning using a Graph-to-Graph model (G2G), graph regularization to impose a prior structure on the graph, and reducing the graph diameter by optimizing for a larger set of edges. Our empirical results validate our analysis and show that this issue also occurs with the Approximate Personalized Propagation of Neural Predictions (APPNP), which approximates a model of infinite receptive field.
Certifications: Featured Certification
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url:
Changes Since Last Submission: The previous version of the article was desk-rejected because it was not appropriately anonymized: link to non-anonymous github in the text. In this version, we replaced this link with an anonymous one.
Assigned Action Editor: ~Rémi_Flamary1
Submission Number: 2165