Abstract: Despite dropout’s ubiquity in machine learning, its effectiveness as a form of data augmentation remains under-explored. We address two key questions: (i) When is dropout effective as an augmentation strategy? (ii) Is dropout uniquely effective under these conditions? To explore these questions, we propose Deep Augmentation, a network- and modality-agnostic method that applies dropout or PCA transformations to targeted layers in neural networks. Through extensive experiments on contrastive learning tasks in NLP, computer vision, and graph learning, we find that uniformly applying dropout across layers does not consistently improve performance. Instead, dropout proves most beneficial in deeper layers and can be matched by alternative augmentations (e.g., PCA). We also show that a stop-gradient operation is critical for ensuring dropout functions effectively as an augmentation, and that performance trends invert when moving from contrastive tasks to supervised tasks. Our analysis suggests that Deep Augmentation helps mitigate inter-layer co-adaptation---a notable issue in self-supervised learning due to the absence of labeled data. Drawing on these insights, we outline a procedure for selecting the optimal augmentation layer and demonstrate that Deep Augmentation can outperform traditional input-level augmentations. This simple yet powerful approach can be seamlessly integrated into a wide range of architectures and modalities, yielding notable gains in both performance and generalization.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=85LXLHf9SW
Changes Since Last Submission: # Below is a summary of the changes made in our revision:
## Clarified Research Objectives:
- We have restructured the introduction to clearly state our primary research question: under what conditions do dropout and PCA (with optional stop-gradient) serve as effective augmentation and regularization strategies. This shift refocuses the narrative from merely showcasing a versatile augmentation method to a deeper investigation into the underlying mechanisms driving these techniques. We have also aligned the language throughout the paper with this revised focus.
## Refined Claims and Evidence:
- In response to reviewer feedback, we now explicitly define the key phenomenon of “co-adaptation” in Section 5. This definition clarifies what we mean by co-adaptation and outlines the expected behaviors, thereby strengthening the link between our analysis and the experimental evidence provided.
## Updated Related Work:
- We have expanded our discussion of related literature, including a more detailed examination of previous studies (e.g., Wu and Gu, 2015) that investigate how dropout’s effectiveness varies when applied at different layers. This additional context helps situate our contributions within the broader research landscape.
## Revised Terminology and Framing:
- We have addressed minor comments regarding the characterization of dropout, PCA, and stop-gradient. The paper now features a more appropriate title, and there is increased emphasis on these techniques as forms of regularization rather than solely as implicit data augmentation.
These changes resolve previous ambiguities regarding our contributions, ensuring that our investigation into the augmentation and regularization mechanisms of dropout and PCA is both clear and compelling for a broad TMLR audience.
Assigned Action Editor: ~Han_Bao2
Submission Number: 4357
Loading