Leveraging Hierarchical Structure for Multi-Domain Active Learning with Theoretical Guarantees

Guang-Yuan Hao; Haotian Wang; Hengguan Huang; Jie Gao; Hao Wang

Leveraging Hierarchical Structure for Multi-Domain Active Learning with Theoretical Guarantees

Guang-Yuan Hao, Haotian Wang, Hengguan Huang, Jie Gao, Hao Wang

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Active Learning, Multi-Domain Learning

TL;DR: We formalize the general definition of multi-domain active learning and propose Composite Active Learning (CAL) as the first general deep AL method for addressing this problem with theoretical guarantees by leveraging hierarchical structure.

Abstract: Active learning (AL) aims to improve model performance within a fixed labeling budget by choosing the most informative data points to label. Existing AL focuses on the single-domain setting, where all data come from the same domain (e.g., the same dataset). However, many real-world tasks often involve multiple domains. For example, in visual recognition, it is often desirable to train an image classifier that works across different environments (e.g., different backgrounds), where images from each environment constitute one domain. Such a multi-domain AL setting is challenging for prior methods because they (1) ignore the similarity among different domains when assigning labeling budget and (2) fail to handle distribution shift of data across different domains. In this paper, we propose the first general method, dubbed composite active learning (CAL), for multi-domain AL. Our approach explicitly considers the hierarchical structure of the problem, i.e., domain-level and instance-level structures. CAL first assigns domain-level budgets according to domain-level importance, which is estimated by optimizing an upper error bound that we develop. With the domain-level budgets, CAL then leverages a certain instance-level query strategy to select samples to label from each domain. Our theoretical analysis shows that our method achieves a better error bound compared to current AL methods. Our empirical results demonstrate that our approach significantly outperforms the state-of-the-art AL methods on both synthetic and real-world multi-domain datasets.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: General Machine Learning (ie none of the above)

6 Replies

Loading