Knowledge-Guided Wasserstein Distributionally Robust Optimization

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose a novel knowledge-guided Wasserstein distributionally robust optimization framework for regression and classification, proving its equivalence to shrinkage estimation based on collinear similarity with prior knowledge.
Abstract: Wasserstein Distributionally Robust Optimization (WDRO) is a principled framework for robust estimation under distributional uncertainty. However, its standard formulation can be overly conservative, particularly in small-sample regimes. We propose a novel knowledge-guided WDRO (KG-WDRO) framework for transfer learning, which adaptively incorporates multiple sources of external knowledge to improve generalization accuracy. Our method constructs smaller Wasserstein ambiguity sets by controlling the transportation along directions informed by the source knowledge. This strategy can alleviate perturbations on the predictive projection of the covariates and protect against information loss. Theoretically, we establish the equivalence between our WDRO formulation and the knowledge-guided shrinkage estimation based on collinear similarity, ensuring tractability and geometrizing the feasible set. This also reveals a novel and general interpretation for recent shrinkage-based transfer learning approaches from the perspective of distributional robustness. In addition, our framework can adjust for scaling differences in the regression models between the source and target and accommodates general types of regularization such as lasso and ridge. Extensive simulations demonstrate the superior performance and adaptivity of KG-WDRO in enhancing small-sample transfer learning.
Lay Summary: Machine learning models often perform poorly when applied to small datasets, especially if the new data differs from what the model was originally trained on — a common scenario in healthcare or social science. A promising technique called distributionally robust optimization (DRO) helps guard against this issue by preparing models for worst-case scenarios. However, standard DRO can be too cautious, leading to less accurate predictions. We introduce a new method called Knowledge-Guided Wasserstein DRO (KG-WDRO). It allows models to intelligently incorporate insights from previous related datasets (called "prior knowledge") without blindly copying them. This makes the model less conservative and better suited for small, real-world datasets. Our approach guides model uncertainty by trusting past knowledge more in areas it is confident about, and less in unfamiliar territory. We show that KG-WDRO improves predictions compared to existing methods — especially when labeled data is scarce. This technique can help machine learning systems adapt better across domains like medicine, finance, and the social sciences, where transferring knowledge wisely is key.
Primary Area: General Machine Learning->Transfer, Multitask and Meta-learning
Keywords: wasserstein distributionally robust optimization, knowledge-guided learning, difference-of-convex optimization, shrinkage-based transfer learning
Submission Number: 5737
Loading