Provable Benefits of Sinusoidal Activation for Modular Addition

ICLR 2026 Conference Submission14553 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Modular addition, two-layer MLP, periodicity
TL;DR: Sine-activated two-layer MLPs on modular addition, are provably—and empirically—more expressive and easier to learn than ReLU.
Abstract: This paper studies the role of activation functions in learning modular addition with two-layer neural networks. We first show that sine activations achieve better expressiveness than ReLU activations, in the sense that the width of ReLU networks must scale linearly with the number of summands $m$ to interpolate, whereas sine networks need only two neurons. We then provide a novel Natarajan-dimension generalization bound for sine networks, which in turn leads to a nearly optimal sample complexity of $\widetilde{\mathcal{O}}(p)$ for ERM over constant width sine networks, where $p$ is the modulus. We also provide a margin-based generalization for sine networks in the overparametrized regime. We empirically validate the better generalization of sine networks over ReLU networks and our margin theory.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 14553
Loading