Family Matters: A Systematic Study of Spatial vs. Frequency Masking for Continual Test-Time Adaptation
Abstract: Recent continual test-time adaptation (CTTA) methods adopt masked image modeling to stabilize learning under distribution shift, yet each treats its masking family $\mathcal{F}$ as a fixed design choice and innovates exclusively along the selection strategy $\mathcal{S}$, leaving the family axis underexplored. We present a systematic empirical study that isolates this axis. Using a controlled CTTA instantiation---Mask to Adapt (M2A)---that fixes $\mathcal{S}{=}\textit{random}$ and standard losses, we vary only $\mathcal{F}$ across spatial (patch, pixel) and frequency (all-band, low-band, high-band) families while keeping every other component identical. The study's contributions are the design guidance it extracts for the CTTA settings we evaluated: (1)~\emph{the masking family determines whether adaptation compounds useful structure or compounds errors}---on patch-tokenized architectures, spatial masking accumulates stable representations over long streams while frequency masking collapses catastrophically. We characterize this instability through a \emph{structural-preservation} account, where spatial coherence maintains the broad-spectrum redundancy needed to avoid terminally overlapping with a corruption's spectral signature; (2)~\emph{the optimal family depends on architecture-task alignment}---on CNNs, whose overlapping receptive fields dilute patch occlusion, the family gap vanishes, whereas on fine-grained tasks with global cues and large-capacity ViTs, frequency masking becomes competitive. In confounded system-level comparisons---where baselines also differ in losses and auxiliary components---M2A's random selection performs comparably to heuristic strategies, though we treat this observation as suggestive context rather than a controlled quantification of $\mathcal{S}$'s relative importance.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: **Revision-1.1 Apr-21-2026**
- Updated Section 1 (Introduction) to frame structural preservation as a conceptual explanation.
- Updated Section 3 (Methodology, Preliminaries) to present structural preservation as a qualitative lens based on spatial coherence and spectral overlap.
- Updated Section 3 (Methodology, scope clarification) to limit the study to input-based zero-out masking families.
- Updated Section 3.1 (Random Spatial Masking) to clarify that the default patch masking is Block-Patch.
- Updated Section 3.1 (Patch-based masking) to note that token-grid alignment may strengthen masking on patch-tokenized architectures.
- Updated Section 3.3.1 (Total Loss) to clarify that M2A uses prediction-based, not reconstruction, losses.
- Updated Section 4.2 (Implementation Details) to clarify that main-text patch results use Block-Patch, while Free-Patch and Grid-Patch are appendix-only.
- Updated Section 4.3.1 (Per-Corruption Family Profiles) to re-emphasize that the main patch family is Block-Patch.
- Updated Section 4.3.2 (Class Activation Map) to note that CAM trends broadly match Figure 2.
- Updated Section 4.6 (Architecture Scoping) to frame the architecture results as a boundary case.
- Updated Section 4.6.3 (Use Case: Aquaculture) to identify aquaculture as the clearest task-level boundary case.
- Updated Section 4.6.4 (Ablation Study) to specify the masking-ratio schedules and number of views.
- Updated Section 5 (Conclusion and Future Work) to separate controlled findings from broader interpretation.
- Updated Table 1 to add SPA results on CIFAR10-C, CIFAR100-C, and ImageNet-C.
- Added Appendix A.3 (Structural-Preservation Hypothesis) to present structural preservation as a conceptual formalization based on spatial coherence and spectral overlap.
- Updated Appendix A.4.1 (Spatial Masking / Patch-based masking) to clarify that the default patch masking is Block-Patch.
- Added Appendix A.5 (Mask Sampling Methods) to compare Block-Patch, Free-Patch, and Grid-Patch with implementation details.
- Updated Appendix A.8.1 (Hyperparameter Robustness) to expand the appendix ablations.
- Added Appendix A.8.2 (Masking Families; Figures 15–17) to test whether the family effect persists beyond the default M2A setup.
- Added Appendix A.9 (Mask Selection Heuristics) to compare Random, Center, and Uncertainty with the patch family fixed.
- Added Appendix A.11 (Extended Discussion) to give an M2A-scoped interpretation of the results.
**Revision-1.2 Apr-22-2026**
- Updated Section 5 (Conclusion and Future Work) to clarify that masking family is a major determinant of adaptation stability.
Assigned Action Editor: ~Stephen_Lin1
Submission Number: 7710
Loading