HERA: Efficient Test-Time Adaptation for Cross-Domain Few-Shot Segmentation with Vision Foundation Models

Junyuan Ma; Xunzhi Xiang; Wenbin Li; Yang Gao; Qi Fan

HERA: Efficient Test-Time Adaptation for Cross-Domain Few-Shot Segmentation with Vision Foundation Models

Junyuan Ma, Xunzhi Xiang, Wenbin Li, Yang Gao, Qi Fan

17 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Test-Time Adaptation ；Cross-Domain Few-Shot Segmentation；Vision Foundation Models；Parameter-Efficient Fine-tuning

TL;DR: HERA: a source-free test-time adaptation framework that turns a few labeled supports into reliable guidance for VFMs in CD-FSS.

Abstract: Vision foundation models (VFMs) excel across vision tasks, but applying them to Cross-Domain Few-Shot Segmentation (CD-FSS) faces two key obstacles: (i) pronounced domain shift that misaligns support–query correspondence; and (ii) few-shot supervision that precludes source-data retraining. As a result, frozen-backbone adaptation rarely treats matching risk as a first-class objective, leaving support–query alignment fragile. We introduce Hierarchical Episode-wise Risk Alignment (HERA), a unified VFM-based principle that contracts alignment risk top–down—across layers, attention, and pixels—under a frozen backbone, thereby reducing support–query mismatch. Concretely, Hierarchical Layer Routing (HLR) routes each episode to its optimal layer to stabilize semantics; Gaussian-Guided Attention (GGA) calibrates self-attention with entropy-gated Gaussian priors, strengthening locality while preserving global coverage; and Pixelwise Adaptive Reweighting (PAR) reweights per-pixel logits with lightweight residuals to recover thin structures and denoise low-contrast regions. Together these modules form a top–down risk-contraction path that unlocks ViT capacity for hierarchical semantics, structured locality, and fine-grained discrimination. By default, HERA is instantiated on DINOv3 and generalizes across ViTs. In extensive evaluations, HERA surpasses the state of the art (+6.51%) without source data or end-to-end retraining, yielding a lightweight, deployable recipe for leveraging VFMs in CD-FSS.

Supplementary Material: pdf

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 9522

Loading