Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts

Ruipeng Zhang; Ziqing Fan; Jiangchao Yao; Ya Zhang; Yanfeng Wang

Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts

Ruipeng Zhang, Ziqing Fan, Jiangchao Yao, Ya Zhang, Yanfeng Wang

Published: 16 Jan 2024, Last Modified: 21 Apr 2024ICLR 2024 posterEveryoneRevisionsBibTeX

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: generalization, sharpness-aware minimization, domain shift

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: The paper presents DISAM, an algorithm enhancing optimization under domain shifts by ensuring domain-level convergence consistency, resulting in quicker convergence, improved generalization, and outperforming existing methods.

Abstract: This paper presents a Domain-Inspired Sharpness-Aware Minimization (DISAM) algorithm for optimization under domain shifts. It is motivated by the inconsistent convergence degree of SAM across different domains, which induces optimization bias towards certain domains and thus impairs the overall convergence. To address this issue, we consider the domain-level convergence consistency in the sharpness estimation to prevent the overwhelming (deficient) perturbations for less (well) optimized domains. Specifically, DISAM introduces the constraint of minimizing variance in the domain loss, which allows the elastic gradient calibration in perturbation generation: when one domain is optimized above the averaging level w.r.t. loss, the gradient perturbation towards that domain will be weakened automatically, and vice versa. Under this mechanism, we theoretically show that DISAM can achieve faster overall convergence and improved generalization in principle when inconsistent convergence emerges. Extensive experiments on various domain generalization benchmarks show the superiority of DISAM over a range of state-of-the-art methods. Furthermore, we show the superior efficiency of DISAM in parameter-efficient fine-tuning combined with the pretraining models. The source code is released at https://github.com/MediaBrain-SJTU/DISAM.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 1547

Loading