A Convex Relaxation Approach to Generalization Analysis for Parallel Positively Homogeneous Networks
TL;DR: A new approach to generalization analysis of deep networks based on connecting non-convex risk minimization objectives to convex relaxations.
Abstract: We propose a general framework for deriving generalization bounds for parallel positively
homogeneous neural networks--a class of neural networks whose input-output map decomposes as
the sum of positively homogeneous maps. Examples of such networks include matrix
factorization and sensing, single-layer multi-head attention mechanisms, tensor
factorization, deep linear and ReLU networks, and more. Our general framework is based on
linking the non-convex empirical risk minimization (ERM) problem to a closely related convex
optimization problem over prediction functions, which provides a global, achievable lower-bound
to the ERM problem. We exploit this convex lower-bound to perform generalization
analysis in the convex space while controlling the discrepancy between the convex model and
its non-convex counterpart. We apply our general framework to a wide variety of models
ranging from low-rank matrix sensing, to structured matrix sensing, two-layer linear
networks, two-layer ReLU networks, and single-layer multi-head attention mechanisms,
achieving generalization bounds with a sample complexity that scales almost linearly with the
network width.
Submission Number: 2143
Loading