Keywords: optimization, Adam, SGD, graph neural networks, data normalization
TL;DR: Data normalization used in the graph neural network community causes Adam to perform better than GD, but better, slimpler, normalization improves them both.
Abstract: We show that feature normalization has a drastic impact on the performance of optimization algorithms in the context of graph neural networks. The standard normalization scheme used throughout the graph neural network literature is not motivated from an optimization perspective, and leads (S)GD to frequently fail. Adam does not fail, but is also negatively impacted by standard normalization methods. We show across multiple datasets and models that better motivated feature normalization closes the gap between Adam and (S)GD, and speeds up optimization for both.
Submission Number: 107
Loading