MaskTwins: Dual-form Complementary Masking for Domain-Adaptive Image Segmentation

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent works have correlated Masked Image Modeling (MIM) with consistency regularization in Unsupervised Domain Adaptation (UDA). However, they merely treat masking as a special form of deformation on the input images and neglect the theoretical analysis, which leads to a superficial understanding of masked reconstruction and insufficient exploitation of its potential in enhancing feature extraction and representation learning. In this paper, we reframe masked reconstruction as a sparse signal reconstruction problem and theoretically prove that the dual form of complementary masks possesses superior capabilities in extracting domain-agnostic image features. Based on this compelling insight, we propose MaskTwins, a simple yet effective UDA framework that integrates masked reconstruction directly into the main training pipeline. MaskTwins uncovers intrinsic structural patterns that persist across disparate domains by enforcing consistency between predictions of images masked in complementary ways, enabling domain generalization in an end-to-end manner. Extensive experiments verify the superiority of MaskTwins over baseline methods in natural and biological image segmentation. These results demonstrate the significant advantages of MaskTwins in extracting domain-invariant features without the need for separate pre-training, offering a new paradigm for domain-adaptive segmentation. The source code is available at https://github.com/jwwang0421/masktwins.
Lay Summary: Masking is to occlude parts of an image. By increasing the learning difficulty, this technique enables models to better understand and analyze images. However, many machine learning scientists traditionally treat masking merely as a special form of data augmentation and overlook its theoretical analysis. We aim to further explore the potential of masked reconstruction in enhancing feature extraction and representation learning and we focus on complementary masks, defined as a pair of non-overlapping masks that together cover the entire image without redundancy. Our study theoretically demonstrates that the complementary masking strategy enhances model robustness. Based on this strategy, we propose MaskTwins, a simple yet effective training framework for domain adaptation tasks, enabling models trained in one scenario to learn essential features that generalize to another different scenario. Our findings provide a new paradigm for domain-adaptive segmentation and offer theoretical insights for masked image modeling.
Link To Code: https://github.com/jwwang0421/masktwins
Primary Area: Applications->Computer Vision
Keywords: domain adaptation, unsupervised learning, masked image modeling, semantic segmentation, complementary masking
Submission Number: 1826
Loading