Keywords: Multi-Source Multi-Modal Domain Adaptation, Phase-guided Perceptual Alignment, Semantic Structures, Domain-invariant Style
TL;DR: PGPA
Abstract: Multi-Source Multi-Modal Domain Adaptation (MSM$^2$DA) is a method that leverages data from multiple sources and modalities to train machine learning models capable of generalizing well across various domains. Existing MSM$^2$DA methods mostly use structural semantic alignment by visual data to enhance the correlation between different modality data, while neglecting the low-frequency perceptual shifts in visual data that hinder cross-modal fusion. However, visual data are particularly sensitive to domain shifts including low-level semantics such as style and illumination variations. To handle this problem, we propose Phase-guided Perceptual Alignment (PGPA) to align the visual styles by transferring low-frequency spectral components from target to source images while preserving high-frequency semantic structures. Specifically, PGPA decomposes images into amplitude and phase spectra in the Fourier domain, where the amplitude captures style-related low-level statistics and the phase retains high-level structural semantics. By selectively blending the amplitude of the target image with the phase of the source image, our method improve diversity and ensures domain-invariant style adaptation without distorting critical semantic details. Furthermore, we provide a bound proof that formalizes the effectiveness of our approach, demonstrating that PGPA guarantees improved cross-domain generalization within a specified bound and ensuring theoretical validity. Extensive experiments demonstrate that our approach significantly improves cross-domain generalization tasks.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 1664
Loading