A new perspective on the nature of dropout

06 May 2026 (modified: 12 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In this work, we explore the average behavior of the learning process with dropout in the contexts of linear regression, generalized linear models, matrix factorization, and fully-connected neural networks with dropout in the last layer. Initially, we find that the average behavior does not distinguish the original dropped-out quantity. The implication of this is that the dropout-induced regularization and optimization are ambiguous from the perspective of the average behavior. To resolve this, we reformulate the average behavior based on the elementary operations that a practitioner is able to apply in the learning process with dropout. Then, we disambiguate the dropout-induced regularization and optimization from the perspective of each reformulation. In the context of linear regression, we show that all of the reformulations result in the same predictions at test time, where the invariant in these predictions is the square of the coefficient of variation of the dropout distribution. More broadly, we demonstrate that the penalty term under dropout depends on the data, parameters, and predictions at train time, when the mean of the dropout distribution is not equal to one.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Ikko_Yamane1
Submission Number: 8789
Loading