Abstract: The rapid development of Vision Foundation Model (VFM) brings superior out-domain generalization for a variety of down-stream tasks.
Among them, domain generalized semantic segmentation (DGSS) holds unique challenges as the cross-domain images share common pixel-wise content information (i.e., semantics) but vary greatly in terms of the style variation (e.g., urban landscape, environment dependencies).
How to effectively fine-tune VLM for DGSS has recently become an open research topic for the vision community.
In this paper, we present a novel Spectral-decomposited Tokens (SET) learning framework to push the frontier.
Delving into further than existing fine-tuning token & frozen backbone paradigm, the proposed SET especially focuses on how to learn style-invariant features from these learnable tokens.
Specifically, the frozen VLM features are first decomposited into the phase and amplitude component respectively in the frequency space, where the phase / amplitude component reflects more on the content / style, respectively.
Then, learnable tokens are adapted to learn the content and style, respectively.
As the cross-domain differences mainly rest in the style from the amplitude component, such information is decoupled from the tokens.
Consequently, the refined feature maps are more stable to represent the pixel-wise content despite the style variation.
Extensive cross-domain experiments under a variety of backbones and VFMs show the state-of-the-art performance.
We will make the source code publicly available.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: The rapid development of Vision Foundation Model (VFM) brings superior out-domain generalization for a variety of down-stream tasks.
How to effectively fine-tune VLM for domain generalized semantic segmentation (DGSS) has recently become an open research topic for the vision community.
In this paper, we present a novel amplitude-decoupled token (ADT) learning framework to push the frontier.
Supplementary Material: zip
Submission Number: 870
Loading