A Language Anchor-Guided Method for Robust Noisy Domain Generalization

TMLR Paper5027 Authors

04 Jun 2025 (modified: 16 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Real-world machine learning applications are often hindered by two critical challenges: distribution shift and label noise. Networks inherently tend to overfit to redundant, uninformative features present in the training distribution, which undermines their ability to generalize effectively to the target domain's distribution. The presence of noisy data further exacerbates this issue by inducing additional overfitting to noise, causing existing domain generalization methods to fail in effectively distinguishing invariant features from spurious ones. To address these challenges, we propose Anchor Alignment and Adaptive Weighting (A3W), a novel algorithm based on sample reweighting guided by natural language processing (NLP) anchors that seeks to extract representative features. In particular, A3W leverages semantic representations derived from natural language models to serve as a source of domain-invariant prior knowledge. We also introduce a weighted loss function that dynamically adjusts the contribution of each sample based on its distance to the corresponding NLP anchor, thereby improving the model’s resilience to noisy labels. Extensive experiments on benchmark datasets demonstrate that A3W outperforms state-of-the-art domain generalization methods, yielding significant improvements in both accuracy and robustness across various datasets and noise levels.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Wenbing_Huang1
Submission Number: 5027
Loading