Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Revise Saturated Activation Functions
Bing Xu, Ruitong Huang, Mu Li
Feb 18, 2016 (modified: Feb 18, 2016)ICLR 2016 workshop submissionreaders: everyone
Abstract: In this paper, we revise two commonly used saturated functions, the
logistic sigmoid and the hyperbolic tangent (tanh).
We point out that, besides the well-known non-zero centered property, slope of the activation function near the origin is another possible
reason making training deep networks with the logistic function
difficult to train. We demonstrate that, with proper rescaling, the logistic sigmoid
achieves comparable results with tanh.
Then following the same argument, we improve tahn by penalizing in the negative part.
that ``penalized tanh'' is comparable and even outperforms the state-of-the-art
non-saturated functions including ReLU and leaky ReLU on deep convolution
Our results contradict to the conclusion of previous works that the saturation
property causes the slow convergence. It suggests further investigation is necessary to
better understand activation functions in deep architectures.
Enter your feedback below and we'll get back to you as soon as possible.