Keywords: Foundation models, transformers, diffusion models, state-space models, RKHS, Neural Tangent Kernel, approximation theory, survey, open problems
TL;DR: Unified RKHS/NTK view of transformers, diffusion, and SSMs; taxonomy of 114 results; approximation rates and seven open problems with difficulty ratings.
Abstract: Foundation models (transformers, diffusion models, state-space models) have achieved remarkable empirical success, yet their theoretical understanding remains fragmented across different mathematical communities. This survey provides a unified mathematical perspective connecting approximation theory, optimization landscape analysis, and statistical learning theory through the lens of Reproducing Kernel Hilbert Spaces (RKHS) and Neural Tangent Kernel (NTK) theory. We present a comprehensive taxonomy of 114 recent theoretical results organized by mathematical tool, establish a unified framework showing how attention mechanisms, score functions, and convolution kernels can be understood as kernel-based approximators, and derive precise comparison theorems between architectures. Our analysis reveals that transformers achieve approximation rate $O(n^{-2s/d})$ for Sobolev-$s$ functions with $O(n^2)$ complexity, while state-space models achieve $O(n^{-s/d})$ with $O(n)$ complexity, suggesting fundamental complexity-expressivity tradeoffs. We identify seven concrete open problems with partial results and difficulty ratings, propose a research roadmap connecting optimization and generalization, and highlight promising directions for neural architecture design. This unified perspective aims to bridge theory and practice, providing foundational insights for developing more principled and efficient foundation model architectures.
Submission Number: 157
Loading