Enhancing Domain Generalization Performance in Low-Resource Setting via External Dataset and Pseudo Labeling With Sentence-BERT

Junho Lee, Seunguk Yu, Jinhee Jang, Keunhyeung Park, Youngbin Kim

Published: 2025, Last Modified: 08 Jan 2026IEEE Access 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recent studies on data augmentation have focused on improving model performance with limited training data within a specific dataset. While the goal is to enhance performance on the dataset itself, this approach also addresses broader challenges, such as enhancing domain generalization. Building on this, we propose the Out-of-Domain Pseudo Labeling (OOD-PL) method, a data augmentation technique designed to ensure data diversity and enhance domain generalization of model in low-resource settings. Our approach introduces external data and assigns pseudo labels based on semantic vicinal interpolation with the intended training data. We observed significant improvements in domain generalization across three datasets from different domains. Unlike traditional methods, this approach utilizes other samples as a form of augmentation for the training data. Our method can be flexibly integrated with existing augmentation techniques, and we demonstrated that it performs well even when the available training data is extremely limited. Furthermore, we conducted various in-depth analysis experiments to strengthen the validity of our proposed method and demonstrate its robustness in effectively enhancing domain generalization. As a result, we were able to propose a methodology that overcomes the limitations of using specific datasets, even in situations where their availability is restricted, by leveraging out-of-domain samples.

External IDs:dblp:journals/access/LeeYJPK25