Keywords: learning theory, kernel methods, complexity measures, reproducing kernel Hilbert space, adaptive kernels
TL;DR: We introduce the effective span dimension, a measure that characterizes minimax generalization and thereby explains why adaptive kernels learned via gradient flow outperform fixed-kernel methods.
Abstract: We study spectral algorithms in the setting where kernels are learned from data. We introduce the effective span dimension (ESD), an alignment-sensitive complexity measure that depends jointly on the signal, spectrum, and noise level $\sigma^2$. The ESD is well-defined for arbitrary kernels and signals without requiring eigen-decay conditions or source conditions. We prove that for sequence models whose ESD is at most $K$, the minimax excess risk scales as $\sigma^2 K$. Furthermore, we analyze over-parameterized gradient flow and prove that it can reduce the ESD of a sequence model, which in turn moves the problem into an easier ESD class and lowers the corresponding minimax risk. This analysis suggests a general route to study how adaptive feature learning can improve generalization through signal-kernel alignment: adaptive learning procedures reshape the kernel so that the ESD decreases and the problem enters an easier ESD class. We also extend the ESD framework to linear models and RKHS regression, and we support the theory with numerical experiments. This framework provides a novel perspective on generalization beyond traditional fixed-kernel theories.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 16093
Loading