Calibrate and Debias Layer-wise Sampling for Graph Convolutional Networks

Published: 27 Jan 2023, Last Modified: 28 Feb 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Multiple sampling-based methods have been developed for approximating and accelerating node embedding aggregation in graph convolutional networks (GCNs) training. Among them, a layer-wise approach recursively performs importance sampling to select neighbors jointly for existing nodes in each layer. This paper revisits the approach from a matrix approximation perspective, and identifies two issues in the existing layer-wise sampling methods: suboptimal sampling probabilities and estimation biases induced by sampling without replacement. To address these issues, we accordingly propose two remedies: a new principle for constructing sampling probabilities and an efficient debiasing algorithm. The improvements are demonstrated by extensive analyses of estimation variance and experiments on common benchmarks. Code and algorithm implementations are publicly available at \url{}.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Dear Editor Zaheer, We have finished the revision. > 1. Add limitation about the proposed method not working for FastGCN. It can't be just sparsity of FastGCN is the cause, as adding the debiased sampling almost always hurts FastGCN's performance. We add some conjugation at the end of Page 12. In addition to the sparsity, we also attribute the poor performance to the main motivation in LADIES: probabilities in FastGCN based on $\|\mathbf P\|$ do not well capture the dynamics of mini-batch SGD compared to the ones in LADIES based on $\|\mathbf Q \mathbf P\|$. Since the FastGCN model does not converge well, debiasing inferior probabilities cannot bring significant improvements. > 2. As pointed out by 31rs, it would be instructive for readers to write about why proposed method with flattening still works better than LADIES on datasets where proportionality assumption holds? Is it better hyper-parameter tuning? We comment at the beginning of Page 6 that the proportionality assumption is indeed violated on all the involved datasets. To summarize our argument, our evidence is two-fold: either negative regression coefficients or small regression $ R^2$'s. Positive regression coefficients with small regression $R^2$ can only imply a weak correlation instead of a "proportionality" relationship. More information and discussion regarding $R^2$ are provided in Table 2 and the following paragraph. > 3. Rename appendix D.3, it can't be titled author response. We have reorganized the original Appendix D and renamed it "Supplementary Regression Results". We also renamed subsections in Appendix D and assigned some contents to other sections in the appendix. --- Again, many thanks for the time and help from the editors and the reviewers. Regards, Paper 256 authors
Assigned Action Editor: ~Manzil_Zaheer1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 256