Beyond Loss Functions: Exploring Data-Centric Approaches with Diffusion Model for Domain Generalization
Abstract: There has been a huge effort to tackle the Domain Generalization (DG) problem with a focus on developing new loss functions. Inspired by the image generation capabilities of the diffusion models, we pose a pivotal question: Can diffusion models function as data augmentation tools to address DG from a data-centric perspective, rather than relying on the loss functions? Our findings reveal that trivial cross-domain data augmentation (CDGA) along with the vanilla ERM using readily available diffusion models without additional finetuning outperforms state-of-the-art (SOTA) training algorithms.
This paper delves into the exploration of why and how this rudimentary data generation can outperform complicated DG algorithms. With the help of domain shift quantification tools, We empirically show that CDGA reduces the domain shift between domains. We empirically reveal connections between the loss landscape, adversarial robustness, and data generation, illustrating that CDGA reduces loss sharpness and improves robustness against adversarial shifts in data. Additionally, we discuss our intuitions that CDGA along with ERM can be considered as a way to replace the pointwise kernel estimates in ERM with new density estimates in the \textit{vicinity of domain pairs} which can diminish the true data estimation error of ERM under domain shift scenario. These insights advocate for further investigation into the potential of data-centric approaches in DG.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. We added section 8 on page 11 and 12 that justifies the source of performance gain on OOD generalization when using stable diffusion model comes from cross-domain transfer ability of CDGA and not just because of extra synthetic images from a pre-trained diffusion model.
2. We removed al the statements in the paper that claim theoretically sampling from LDM is equivalent to sampling from the density defined by $K(\cdot;P_{i\to j}(x_{k}^{i}))$.
3. We modified the equation above equation 8 by adding the normalizing constant.
4. We shifted the sections on loss function landscape analysis and adversarial robustness to the appendix.
5. We moved the mitigating class imbalance section from the appendix to the main paper.
6. We added a discussion on the comparison between SDGA and CDGA. This part considers the advantage of each method compared to the other one and can be found on page 11 (section 7.2).
6. We moved the first page of Appendix A to the main paper.
Supplementary Material: pdf
Assigned Action Editor: ~Pin-Yu_Chen1
Submission Number: 3092
Loading