- Abstract: Most deep neural networks use non-periodic and monotonic—or at least quasiconvex— activation functions. While sinusoidal activation functions have been successfully used for specific applications, they remain largely ignored and regarded as difficult to train. In this paper we formally characterize why these networks can indeed often be difficult to train even in very simple scenarios, and describe how the presence of infinitely many and shallow local minima emerges from the architecture. We also provide an explanation to the good performance achieved on a typical classification task, by showing that for several network architectures the presence of the periodic cycles is largely ignored when the learning is successful. Finally, we show that there are non-trivial tasks—such as learning algorithms—where networks using sinusoidal activations can learn faster than more established monotonic functions.
- TL;DR: Why nets with sine as activation function are difficult to train in theory. Also, they often don't use the periodic part if not needed, but when it's beneficial they might learn faster
- Conflicts: tut.fi
- Keywords: Theory, Deep learning, Optimization, Supervised Learning