Bridging Domains with Approximately Shared Features
TL;DR: We theoretically analyze the multi-source representation learning and proposed a disentangled representation learning method.
Abstract: Machine learning models can suffer from performance degradation when applied to new tasks due to distribution shifts. Feature representation learning offers a robust solution to this issue. However, a fundamental challenge remains in devising the optimal strategy for feature selection. Existing literature is somewhat paradoxical: some advocate for learning invariant features from source domains, while others favor more diverse features. For better understanding, we propose a statistical framework that evaluates the utilities of the features (\ie how differently the features are used in each source task) based on the variance of their correlation to $y$ across different domains. Under our framework, we design and analyze a learning procedure consisting of learning
content features (comprising both invariant and approximately shared features) from source tasks and fine-tuning them on the target task. Our theoretical analysis highlights the significance of learning approximately shared features—beyond strictly invariant ones—when distribution shifts occur. Our analysis also yields an improved population risk on target tasks compared to previous results. Inspired by our theory, we introduce ProjectionNet, a practical method to distinguish content features from environmental features via \textit{explicit feature space control}, further consolidating our theoretical findings.
Submission Number: 187
Loading