Nonlinearity–Phase–Generalization Theory: OOD Bounds for KANs and MLPs

20 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: OOD generalization, Kolmogorov-Arnold networks, signal processing, Fourier transform
Abstract: Recent theory has sharpened OOD generalization, yet most analyses are weight‑space and assume fixed activations, offering little guidance when nonlinearities are learned, as in Kolmogorov-Arnold Networks (KANs). We develop Nonlinearity–Phase–Generalization (NPG): a function‑space framework that links per‑nonlinearity smoothness (total variation of the derivative, TVD) to phase preservation (via global cross‑bispectrum, GCB) and, in turn, to the source–target risk gap. The resulting bounds are finite‑width, additive in depth, and architecture‑aware, placing KANs and MLPs on common footing. NPG yields actionable rules: select smaller‑TVD nonlinearities; for polynomial KANs, keep degree-range product small and decorrelate layers; within MLPs, Softplus reduces the model term vs ReLU under bounded inputs. Controlled PACS/VLCS studies follow these predictions (e.g., OOD gap: 21.85 vs 26.53 on PACS; 9.87 vs 12.67 on VLCS), and TVD/GCB operate as training diagnostics. By tying nonlinearity design to OOD error, NPG enables principled architecture choices and reproducible tuning under domain shift. Code is available at: https://github.com/***
Primary Area: learning theory
Submission Number: 23614
Loading