Abstract: Test-time adaptation (TTA) updates a model online during deployment to improve robustness to distribution shifts.
While TTA updates give robustness, they take time: the update computation during inference makes deployment impractical for latency-sensitive systems.
We present \textbf{Caravan}, an asynchronous TTA framework that decouples inference from update computation.
Caravan maintains three concurrent streams that run on a \emph{single GPU}: a high-priority inference stream and two low-priority streams for computing updates.
Because updates necessarily lag behind inference, Caravan revisits sample selection to only update the normalization-layer affine parameters and running statistics after (i) entropy filtering to retain reliable samples and (ii) gradient-consistency filtering of per-sample entropy gradients w.r.t. the last normalization layer to filter conflicting updates.
Caravan improves latency by up to $6.8\times$ and accuracy by 1.99\% over synchronous TTA methods on ImageNet-C with ResNet50-BN.
Submission Number: 25
Loading