Active Question Learning: learning a partial annotation policy in hierarchical label spaces

Ignacio Laurenty; Sanjeel Parekh; Ekhine Irurozki; Florence d'Alché-Buc

Active Question Learning: learning a partial annotation policy in hierarchical label spaces

Ignacio Laurenty, Sanjeel Parekh, Ekhine Irurozki, Florence d'Alché-Buc

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Active Learning, Multi-Armed Bandit, Partial Label Learning, Hierarchical class

TL;DR: We propose a new active learning task designed for the construction of partially annotated datasets using class hierarchies as a basis for labelling.

Abstract: Active learning (AL) aims at alleviating data annotation cost by choosing the samples to be annotated, without sacrificing classifier accuracy. This entails a strategic selection of the most informative or uncertain data points for annotation, ultimately contributing to a cost-effective learning process. Currently no approach takes into account the diversity of expertise of annotators and the opportunity to rely on partial labeling instead of full labeling. For many multi-class classification problems, a hierarchical taxonomy can be defined by grouping classes by similarity, with the original classes as the leaves and subsets of classes (i.e. composite classes) as the internal nodes. We propose to leverage this hierarchy by allowing annotators to partially label data using composite classes as questions. We posit that questions higher up in the hierarchy require lesser expertise and hence involve lower annotation cost. To this end, we introduce a novel AL task, Active Question Learning (AQL), in which an agent decides which questions to ask to annotate a sample given the current state of the classifier. Considering this task as a Multi-Armed Bandit problem where each arm corresponds to a question associated with a (composite/atomic) class, we propose AQTS, a contextual Thompson Sampling algorithm to solve it. We demonstrate the efficacy of our approach on standard image classification datasets.

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7485

Loading