MoReDrop: Dropout without Dropping

Published: 18 Jun 2024, Last Modified: 11 Jul 2024WANT@ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Deep Learning, Dropout, Scalbility, Model Distributional Shift, Regularization
TL;DR: We propose MoReDrop to mitigate model distributional shifts in dropout models by only actively updating the dense model with a dense-to-sub regularizer to seek the dropout benefits. We also present MoReDropL, a lighter, more efficient variant.
Abstract: Dropout is a widely adopted technique that significantly improves the generalization of deep neural networks in various domains. However, the discrepancy in model configurations between the training and evaluation phases introduces a significant challenge: the model distributional shift. In this study, we introduce an innovative approach termed Model Regularization for Dropout (MoReDrop). MoReDrop actively updates solely the dense model during training, targeting its loss function optimization and thus eliminating the primary source of distributional shift. To further leverage the benefits of dropout, we introduce a regularizer derived from the output divergence of the dense and its dropout models. Importantly, sub-models receive passive updates owing to their shared attributes with the dense model. To reduce computational demands, we introduce a streamlined variant of MoReDrop, referred to as MoReDropL, which utilizes dropout exclusively in the final layer. Our experiments, conducted on several benchmarks across multiple domains, consistently demonstrate the scalability, efficiency, and robustness of our proposed algorithms.
Submission Number: 6
Loading