Keywords: Kolmogorov–Arnold networks, deep learning, initialization, power law
TL;DR: We investigate how different initialization strategies affect the training of Kolmogorov–Arnold Networks, focusing on the effectiveness of a simple power-law scheme.
Abstract: Kolmogorov–Arnold Networks (KANs) are a recently introduced neural architecture that use trainable activation functions instead of fixed ones, offering greater flexibility and interpretability. Although KANs have shown promising results across various tasks, little attention has been given to how they should be initialized. In this work, we explore alternative initialization strategies, including two variance-preserving methods based on classical ideas and an empirical power-law approach with tunable exponents. Using function fitting as a small-scale testbed, we run a large grid search over architectures and initialization settings. We find that power-law configurations consistently outperform the standard baseline initialization across all architectures. The variance-preserving methods tend to underperform on smaller models but outperform the baseline as networks grow deeper and wider, though they still do not match the performance of power-law initialization. Overall, our results highlight initialization as an important yet underexplored aspect of KANs and point to several directions for future work.
Code: zip
Jupyter Notebook: ipynb
Submission Number: 8
Loading