Generalizable Active Learning: Boosting Out-of-Distribution Generalization in Active Learning with Simulated Generalization via Augmentation

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Active Learning, out of distribution Generalization, Simulation Generalization
Abstract: Active Learning (AL) aims to select informative samples for annotation within a constrained labeling budget. Existing AL methods typically focus on achieving high performance on Independently and Identically Distributed (IID) source data and often terminate once IID performance converges. However, we identify a crucial yet under-explored issue: despite IID convergence, models trained under AL methods often exhibit a significant performance gap in Out-Of-Distribution (OOD) scenarios compared to models trained on the full labeled dataset, and closing this OOD gap often requires a much larger labeling budget. To address this issue, we introduce the task of Generalizable Active Learning (GAL), which aims to improve the OOD generalization while preserving source-domain performance and minimizing additional labeling costs. We further introduce Simulated Generalization Active Learning (SimGAL), a framework that simulates generalization scenarios through data augmentation without incurring extra annotations. SimGAL comprises: (1) Simulated Generalization Augmentation (SGA), which generates augmented samples simulating OOD characteristics for the pool of labeled samples, and (2) Quality Stabilization Module (QSM), which filters out overly distorted augmented samples to ensure stable training. We design two train-test paradigms specifically designed for the GAL task. Experimental results demonstrate that SimGAL significantly enhances OOD generalization performance of AL methods under matched labeling budget and training sample sizes.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 7554
Loading