Keywords: regularization, overfitting, output layer, architecture, deep learning, neural networks
TL;DR: Simple and effective output layer designs that improve regularization in deep neural networks
Abstract: Deep neural networks are prone to overfitting, especially on small datasets. Common regularizers such as dropout or dropconnect reduce overfitting, but are complex and prone to hyperparameter choices, thus prolonging development cycles in practice. In this paper, we propose simple but effective design changes to the output layer - namely randomization, sparsity, activation scaling, and ensembling - that lead to improved regularization. These designs are motivated by experiments showing that standard fully-connected output layers tend to rely on individual input neurons, which in turn do not cover the variance of the data. We call these two related phenomena neuron dependency and expressivity, propose different ways to measure them, and optimize the presented output layers for them. In our experiments, we compare these layer types for image classification and semantic segmentation across architectures, datasets, and application settings. We report significantly and consistently improved performance of up to 10% points in accuracy over standard output layers while reducing the number of trainable parameters by up to 90%. It is demonstrated that neither training of output layers is required, nor are output layers themselves crucial components of deep networks.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
Supplementary Material: zip
12 Replies
Loading