Keywords: Continual Test-Time Adaptation, Masked Image Modeling, Teacher-Student Learning, Catastrophic Forgetting, Vision Transformers
TL;DR: MiDEA achieves state of the art continual test time adaptation by combining global local self distillation with dual rate EMA, running 3× faster than existing methods
Abstract: Continual test-time adaptation (CTTA) aims to maintain model accuracy under non-stationary distribution shifts when source training data are unavailable. Existing methods that update models using their own predictions ("pseudo labels") face a fundamental trade-off: they struggle to balance rapid adaptation to new conditions with retention of previously learned knowledge. Many recent approaches address this through multiple forward passes, but this makes them impractical for deployment under strict latency or memory constraints.
To overcome these limitations, we introduce **MiDEA** (**M**asked-**i**mage modeling with **D**ual-**E**MA **A**daptation), a simple yet powerful method specifically designed for efficient continual test-time adaptation. MiDEA maintains a single, decoder-free teacher–student architecture that measures distribution gaps at both global image and local patch levels, guiding the student to minimize these gaps at each adaptation step.
At the global level, MiDEA aligns full-image predictions from a clean teacher view with those from a strongly augmented student view. Locally, it matches the teacher's patch-wise representations to the student's masked patch embeddings, ensuring fine-grained spatial details adapt effectively. The teacher model is refreshed using a novel dual rate exponential moving average (EMA): attention layers update rapidly to absorb new visual conditions, while MLP weights drift more slowly to preserve prior knowledge, empirically reducing catastrophic forgetting.
On ViT-Base, MiDEA reduces ImageNet-C error to **38.1%**, 18 percentage points below a frozen model and 5 points below the previous CTTA state of the art, while running **3x faster** than recent multi-pass methods. Notably, accuracy remains stable even at a batch size of 1, making MiDEA practical for real time, memory constrained deployments.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 15325
Loading