Keywords: activation, imbalanced activation, normalization, layer-wise balanced activation, layer-level activation, LayerAct
TL;DR: Layer-wise Balanced Activation Mechanism
Abstract: We propose a novel activation mechanism called LayerAct mechanism to develop layer-wise balanced activation functions that converge faster and perform better than existing activation functions. During the backpropagation in neural networks, the scale of activation function determines how much the parameters will be trained using each sample. This fact indicates that training of a neural network can be biased against samples when the distribution of activation is imbalanced among samples. With a simple experiment on the unnormalized network with rectified linear units (ReLUs) for activation, we show that there is a relationship between the sum of the activation scale and the training loss, which indicates that an imbalanced activation scale among samples can result in a bias in learning. The layer normalization (LayerNorm) can be used to avoid such problems of bias in learning by balancing the layer-wise distribution of inputs for activation functions. However, LayerNorm loses the mean and variance statistics of activated instances among samples during re-scaling and re-centering. Our proposed LayerAct mechanism balances the layer-wise distribution of activation outputs for all samples without re-scaling and re-centering; this way, LayerAct functions avoid not only the problem of bias in learning, but also the dilution problem of key statistics. LayerAct functions allow negative activation outputs when the activated signals have to be negative; thus, the machine can avoid bias shifts during learning, enabling rich representations at the end. Moreover, the proposed LayerAct mechanism can be used with the batch normalization (BatchNorm). Experiments show that LayerAct functions outperform the unbalanced element-level activation functions on two benchmark image classification datasets, CIFAR10 and CIFAR100. Given the essential role of activation in traditional multi-layer perceptrons (MLPs), convolutional neural networks (CNNs), and modern deep learning frameworks, our original work on the layer-wise activation fundamentally addressing the core mechanism of learning through multiple layers will contribute in developing high-performance machine learning frameworks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: General Machine Learning (ie none of the above)
6 Replies
Loading