Keywords: Kolmogorov-Arnold networks, weight initialization, NTK, Function Fitting, PDEs
TL;DR: We propose new initialization schemes for KANs, showing that our Glorot-inspired method outperforms the default on large architectures, while our empirical power-law scheme consistently achieves the best results.
Abstract: Kolmogorov–Arnold Networks (KANs) are a recently introduced neural architecture that replace fixed nonlinearities with trainable activation functions, offering enhanced flexibility and interpretability. While KANs have been applied successfully across scientific and machine learning tasks, their initialization strategies remain largely unexplored. In this work, we study initialization schemes for spline-based KANs, proposing two theory-driven approaches inspired by LeCun and Glorot, as well as an empirical power-law family with tunable exponents. Our evaluation combines large-scale grid searches on function fitting and forward PDE benchmarks, an analysis of training dynamics through the lens of the Neural Tangent Kernel, and evaluations on a subset of the Feynman dataset. Our findings indicate that the Glorot-inspired initialization significantly outperforms the baseline in parameter-rich models, while power-law initialization achieves the strongest performance overall, both across tasks and for architectures of varying size. This work underscores initialization as a key factor in KAN performance and introduces practical strategies to improve it.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 1805
Loading