Positive-First Most Ambiguous: A Simple yet Efficient Active Learning Criterion for Novel Class Retrieval

Positive-First Most Ambiguous: A Simple yet Efficient Active Learning Criterion for Novel Class Retrieval

TMLR Paper3848 Authors

07 Jan 2025 (modified: 11 Mar 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Novel Class Retrieval is nowadays of crucial importance to leverage and explore the large amounts of available unlabeled data. It is defined as the iterative creation of a novel unknown class-of-interest based on an initial query, while relying on the use of human interaction. We formulate this problem as an Active Learning-based Relevance Feedback problem, where the human-in-the-loop periodically intervenes to label a subset of the data to train a one-versus-all classifier. In this case, the goal of the used Active Learning strategy is two-fold: rapidly fill the class-of-interest, and ensure that all class patterns are covered. However, most Active Learning methods only aim at improving the classifier performances, without considering the two previous aspects. To this end, we introduce a novel Active Learning criterion that balances classifier performances and class retrieval efficiency by selecting the most informative samples with the highest probability of being positive. We also formulate a novel coverage metric to evaluate the retrieval performances. In addition to well-balanced datasets, evaluation is performed on real-world-like long-tailed datasets, which provide different degrees of class-of-interest imbalance. Results show that our criterion outperforms widely used strategies like Most Ambiguous and Most Positive. We also provide a framework to help researchers create and experiment with new Active Learning methods in the context of Novel Class Retrieval.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Sivan_Sabato1

Submission Number: 3848

Loading