everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
Dropout has been instrumental in enhancing the generalization capabilities of deep neural networks across a myriad of domains. However, its deployment introduces a significant challenge: the model distributional shift between the training and evaluation phases. Previous approaches have primarily concentrated on regularization methods, invariably employing the sub-model loss as the primary loss function. Despite this, those methods continue to encounter a persistent distributional shift during evaluation, a consequence of the implicit expectation inherent to the evaluation process. In this study, we introduce an innovative approach, namely Model Regularization for Dropout (MoReDrop). MoReDrop effectively addresses distributional shift by prioritizing the loss function from the dense model, supplemented by a regularization term derived from the pair of dense-sub models. This approach allows us to leverage the benefits of dropout without requiring gradient updates in the sub-models. To further mitigate the computational cost, we propose a lightweight version of MoReDrop, denoted as MoReDropL. This variant trades off a degree of generalization ability for reduced computational burden by employing dropout only at the last layer. Our experimental evaluations, conducted on several benchmarks across multiple domains, consistently demonstrate the scalability and efficiency of our proposed algorithms.