Gibbs-Based Information Criteria and the Over-Parameterized Regime

Published: 07 Nov 2023, Last Modified: 13 Dec 2023M3L 2023 PosterEveryoneRevisionsBibTeX
Keywords: Bayesian Information Criterion, Double Descent, Gibbs algorithm, Information Risk Minimization, Random feature model
Abstract: Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an interpolating threshold with over-parameterization, which is not predicted by information criteria in their classical forms due to the limitations in the standard asymptotic approach. We update these analyses using the information risk minimization framework and provide Bayesian Information Criterion (BIC) for models trained by the Gibbs algorithm. Notably, the BIC penalty term for the Gibbs algorithm corresponds to a specific information measure, i.e., KL divergence. We extend this information-theoretic analysis to over-parameterized models by characterizing the Gibbs-based BIC for the random feature model in the regime where the number of parameters $p$ and the number of samples $n$ tend to infinity, with $p/n$ fixed. Our experiments demonstrate that the Gibbs-based BIC can select the high-dimensional model and reveal the mismatch between marginal likelihood and population risk in the over-parameterized regime, providing new insights for understanding the double-descent phenomenon.
Submission Number: 72