Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Adaptive Dropout with Rademacher Complexity Regularization
Nov 03, 2017 (modified: Nov 03, 2017)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:We propose a novel framework to adaptively adjust the dropout rates for the deep neural network based on a Rademacher complexity bound. The state-of-the-art
deep learning algorithms impose dropout strategy to prevent feature co-adaptation.
However, choosing the dropout rates remains an art of heuristics or
relies on empirical grid-search over some hyperparameter space. In this work,
we show the network Rademacher complexity is bounded by a function related
to the dropout rate vectors and the weight coefficient matrices. Subsequently, we
impose this bound as a regularizer and provide a theoretical justified way to trade-off
between model complexity and representation power. Therefore, the dropout
rates and the empirical loss are unified into the same objective function, which is
then optimized using the block coordinate descent algorithm. We discover that
the adaptively adjusted dropout rates converge to some interesting distributions
that reveal meaningful patterns.Experiments on the task of image and document
classification also show our method achieves better performance compared to the
state-of-the-art dropout algorithms.
TL;DR:We propose a novel framework to adaptively adjust the dropout rates for the deep neural network based on a Rademacher complexity bound.
Keywords:model complexity, regularization, deep learning, model generalization, adaptive dropout
Enter your feedback below and we'll get back to you as soon as possible.