TESLA: Taylor Expansion of Sinusoidal Learnable Activations
TL;DR: harmonic (\ell_1) budget that controls polynomial order, stabilizes training via Lipschitz bounds, supports Rademacher/NTK analyses, and improves high-order tasks and ImageNet-100.
Abstract: The parity problem—deciding whether the number of ones in a binary vector is odd or even—remains challenging for standard neural networks due to linear inseparability and the need for global interactions. We propose TESLA, an activation defined as a learnable combination of sine and cosine terms whose Taylor coefficients are trained directly, enabling explicit control over polynomial degree and selective amplification of high-order components. Theoretically, we show that constraining TESLA’s coefficients yields Lipschitz/Rademacher complexity bounds and shapes the training dynamics to emphasize higher-frequency structure. Empirically, on parity with input length $n=32$, TESLA achieve perfect generalization using ~100K training sample ($\approx 0.002\%$ of the $2^{32}$ input space). Notably, TESLA maintains strong generalization under heavy corruption, retaining high accuracy with up to 30\% label noise in the parity signals. Beyond synthetic structure, TESLA delivers comparable performance on ImageNet-100, indicating that activation-level degree control transfers to more general vision workloads. These findings suggest that TESLA is an effective mechanism to improve expressivity and sample efficiency on tasks requiring global structure.
Submission Number: 2415
Loading