MixFilter: Pre-train Aware Structured Dropout for Domain Generalization

MixFilter: Pre-train Aware Structured Dropout for Domain Generalization

TMLR Paper3145 Authors

07 Aug 2024 (modified: 07 Dec 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Model ensembling is a widely adopted technique for improving the robustness of convolutional neural network (CNNs) classifiers against distribution shifts. This method involves either averaging the predictions of multiple models or combining their weights. However, it comes with considerable computational overhead, as it requires training multiple networks. Recently, fine-tuning with very high dropout rates at the penultimate layer has been shown to mimic many benefits of ensembling without requiring multiple training runs. However, a performance gap persists, likely due to the limited regularization applied solely at the final layer of CNNs. In this paper, we present MixFilter, a novel dropout strategy that is designed for fine-tuning convolutional neural networks that leverage rich pre-trained representations for domain generalization. MixFilter enhances functional diversity across subnetworks by stochastically mixing convolutional filters from all the layers of fine-tuned and pre-trained models. Our experimental results indicate that on five domain generalization benchmarks—PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet—MixFilter achieves out-of-domain accuracy comparable to ensemble-based approaches while avoiding additional inference or training overhead. Anonymized source code is available at \url{https://anonymous.4open.science/r/MixFilter-6EEE}.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: - Add new ablation studies for different backbones. - Expanded section 4.1 to further clarify the computational advantages of MixFilter compared to ensemble-based baselines.

Assigned Action Editor: ~Aaron_Klein1

Submission Number: 3145

Loading