Keywords: Bayesian Active Learning, Distribution Shift, Batch Acquisition
Abstract: The performance of machine learning models may suffer from significant decline when evaluated on the data exhibiting distribution shift. Although extensive research on algorithm design have been proposed, the acquisition of new data points to enlarge training datasets has also been verified as a promising solution path. Starting from this idea, we built our research upon bayesian active learning and propose a method that can efficiently acquire samples from a candidate pool of diverse data sources for improving performance on the shifted target population. Specifically, our method designs a novel acquisition function characterizing a Lower Bound of Batch Information Gain (LB-BatchIG) for target distribution and formulates batch acquisition as a submodular optimization problem. By resolving it with a greedy algorithm, we can determine the data batch from the candidate pool for annotation and training. Empirical studies on synthetic datasets and real-world datasets, including tabular data and image data, demonstrate that our batch acquisition algorithm can contribute to greater performance improvement than other algorithms.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 22301
Loading