Keywords: model complexity, regularization, deep learning, model generalization, adaptive dropout
TL;DR: We propose a novel framework to adaptively adjust the dropout rates for the deep neural network based on a Rademacher complexity bound.
Abstract: We propose a novel framework to adaptively adjust the dropout rates for the deep neural network based on a Rademacher complexity bound. The state-of-the-art deep learning algorithms impose dropout strategy to prevent feature co-adaptation. However, choosing the dropout rates remains an art of heuristics or relies on empirical grid-search over some hyperparameter space. In this work, we show the network Rademacher complexity is bounded by a function related to the dropout rate vectors and the weight coefficient matrices. Subsequently, we impose this bound as a regularizer and provide a theoretical justified way to trade-off between model complexity and representation power. Therefore, the dropout rates and the empirical loss are unified into the same objective function, which is then optimized using the block coordinate descent algorithm. We discover that the adaptively adjusted dropout rates converge to some interesting distributions that reveal meaningful patterns.Experiments on the task of image and document classification also show our method achieves better performance compared to the state-of the-art dropout algorithms.