MoReDrop: Dropout Without Dropping

Duo Li; Li Jiang; Yichuan Ding; Xue Liu; Victor Wai Kin Chan

MoReDrop: Dropout Without Dropping

Duo Li, Li Jiang, Yichuan Ding, Xue Liu, Victor Wai Kin Chan

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Deep Learning, Dropout, Regularization

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We propose MoReDrop to mitigate distributional shifts in dropout models by prioritizing the dense model's loss function and adding a regularization from dense-sub model pairs. We also present MoReDropL, a lighter, more efficient variant.

Abstract: Dropout has been instrumental in enhancing the generalization capabilities of deep neural networks across a myriad of domains. However, its deployment introduces a significant challenge: the model distributional shift between the training and evaluation phases. Previous approaches have primarily concentrated on regularization methods, invariably employing the sub-model loss as the primary loss function. Despite this, those methods continue to encounter a persistent distributional shift during evaluation, a consequence of the implicit expectation inherent to the evaluation process. In this study, we introduce an innovative approach, namely Model Regularization for Dropout (MoReDrop). MoReDrop effectively addresses distributional shift by prioritizing the loss function from the dense model, supplemented by a regularization term derived from the pair of dense-sub models. This approach allows us to leverage the benefits of dropout without requiring gradient updates in the sub-models. To further mitigate the computational cost, we propose a lightweight version of MoReDrop, denoted as MoReDropL. This variant trades off a degree of generalization ability for reduced computational burden by employing dropout only at the last layer. Our experimental evaluations, conducted on several benchmarks across multiple domains, consistently demonstrate the scalability and efficiency of our proposed algorithms.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6316

Loading