Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Taming the waves: sine as activation function in deep neural networks
Giambattista Parascandolo, Heikki Huttunen, Tuomas Virtanen
Nov 04, 2016 (modified: Jan 13, 2017)ICLR 2017 conference submissionreaders: everyone
Abstract:Most deep neural networks use non-periodic and monotonic—or at least
quasiconvex— activation functions. While sinusoidal activation functions have
been successfully used for specific applications, they remain largely ignored and
regarded as difficult to train. In this paper we formally characterize why these
networks can indeed often be difficult to train even in very simple scenarios, and
describe how the presence of infinitely many and shallow local minima emerges
from the architecture. We also provide an explanation to the good performance
achieved on a typical classification task, by showing that for several network architectures
the presence of the periodic cycles is largely ignored when the learning
is successful. Finally, we show that there are non-trivial tasks—such as learning
algorithms—where networks using sinusoidal activations can learn faster than
more established monotonic functions.
TL;DR:Why nets with sine as activation function are difficult to train in theory. Also, they often don't use the periodic part if not needed, but when it's beneficial they might learn faster
Keywords:Theory, Deep learning, Optimization, Supervised Learning
Enter your feedback below and we'll get back to you as soon as possible.