Controlling Generalization Error via Gaussian Width Gradient Complexity

Sep 29, 2021ICLR 2022 Conference Desk Rejected SubmissionReaders: Everyone
  • Keywords: PAC Bayesian, Generalizability, Learnability, Statistical Learning Theory
  • Abstract: A learnable problem is characterized by its formulation as an asymptotic empirical risk minimization problem that can be stably optimized. In this work, we develop a consistent approach for defining such an asymptotic empirical risk estimator, which is in addition provably generalizable, by constructing a regularizer based on Gaussian Width Gradient (GWG) complexity. Using the Majorising Measures theorem, we show that the GWG complexity is the optimal local measure of functional complexity, which captures the fine-grained interaction of both the data distribution as well as the risk functional, over controlling the generalization error of a learning problem. As an important corollary, this notion of complexity tightly controls log-Sobolev inequality, which implies that loss distributions of simple models in this regime admit good mean-field approximations. Further, it ensures provably fast convergence of stable optimization algorithms like the unadjusted Langevin algorithm, stochastic gradient descent and their variants. We compare the empirical generalization performance of the derived regularizer against popular L^{1} and L^{2} regularizers, for various regression and classification tasks using general linear models, Gaussian processes and ResNet type deep CNN architectures.
  • One-sentence Summary: Controlling generalization error using fine grained notion of local function complexity. We construct a regularizer which optimally captures interaction between, data distribution and risk function in order to control generalization error.
1 Reply