Normalization Matters for Optimization Performance on Graph Neural Networks

Alan Milligan; Frederik Kunstner; Hamed Shirzad; Mark Schmidt; Danica J. Sutherland

Normalization Matters for Optimization Performance on Graph Neural Networks

Alan Milligan, Frederik Kunstner, Hamed Shirzad, Mark Schmidt, Danica J. Sutherland

Published: 10 Oct 2024, Last Modified: 07 Dec 2024NeurIPS 2024 WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: optimization, Adam, SGD, graph neural networks, data normalization

TL;DR: Data normalization used in the graph neural network community causes Adam to perform better than GD, but better, slimpler, normalization improves them both.

Abstract: We show that feature normalization has a drastic impact on the performance of optimization algorithms in the context of graph neural networks. The standard normalization scheme used throughout the graph neural network literature is not motivated from an optimization perspective, and leads (S)GD to frequently fail. Adam does not fail, but is also negatively impacted by standard normalization methods. We show across multiple datasets and models that better motivated feature normalization closes the gap between Adam and (S)GD, and speeds up optimization for both.

Submission Number: 107

Loading