Abstract: A greater demand for accuracy and performance in neural networks has led to deeper networks with a large number of parameters. Overfitting is a major problem for such deeper networks. Dropout is a popular regularization strategy used in deep neural networks to mitigate overfitting. However, dropout requires a hyperparameter to be chosen for every dropout layer. This process becomes tedious when the network has several dropout layers. In this paper, we introduce a method of sampling a dropout rate from an automatically determined distribution. We further build on this automatic selection of dropout rate by clustering the activations and adaptively applying different rates to each cluster. We have evaluated both our approaches using the CIFAR-10, CIFAR-100, and Fashion-MNIST datasets, using two state-of-the-art Wide ResNet variants as well as a simpler network. We show that our methods outperform standard dropout across all datasets and neural networks.
0 Replies
Loading