Anisotropic Random Feature Regression in High DimensionsDownload PDF

29 Sept 2021, 00:36 (edited 11 May 2022)ICLR 2022 PosterReaders: Everyone
  • Keywords: random feature models, high dimensional asymptotics, generalization, learning curves, double descent, multiple descent, alignment
  • Abstract: In contrast to standard statistical wisdom, modern learning algorithms typically find their best performance in the overparameterized regime in which the model has many more parameters than needed to fit the training data. A growing number of recent works have shown that random feature models can offer a detailed theoretical explanation for this unexpected behavior, but typically these analyses have utilized isotropic distributional assumptions on the underlying data generation process, thereby failing to provide a realistic characterization of real-world models that are designed to identify and harness the structure in natural data. In this work, we examine the high-dimensional asymptotics of random feature regression in the presence of structured data, allowing for arbitrary input correlations and arbitrary alignment between the data and the weights of the target function. We define a partial order on the space of weight-data alignments and prove that generalization performance improves in response to stronger alignment. We also clarify several previous observations in the literature by distinguishing the behavior of the sample-wise and parameter-wise learning curves, finding that sample-wise multiple descent can occur at scales dictated by the eigenstructure of the data covariance, but that parameter-wise multiple descent is limited to double descent, although strong anisotropy can induce additional signatures such as wide plateaus and steep cliffs. Finally, these signatures are related to phase transitions in the spectrum of the feature kernel matrix, and unlike the double descent peak, persist even under optimal regularization.
  • One-sentence Summary: We derive exact asymptotic formulas for the total error, bias, and variance of random feature regression with anisotropic inputs and target weights, and identify a new type of singularity in sample-wise learning curves.
  • Supplementary Material: zip
20 Replies