Automating Active Labelling with Greedy Silhouette Search

ICLR 2026 Conference Submission16243 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Active Learning, AutoML, Clustering
TL;DR: This paper proposes the Silhouette search algorithm for selecting samples from a dataset that optimises the macro-averaged medoid Silhouette.
Abstract: Labelling data is expensive, making active learning especially valuable in low-budget settings where only a few samples can be annotated. However, existing methods often rely on delicate and complex hyper-parameter tuning, which often requires labelled validation data. We introduce Greedy Silhouette Search (GSS), a practical and robust method that leverages the Silhouette clustering metric to guide both sample selection and hyper-parameter configuration. We prove a bound on generalisation error for the 1-Nearest Neighbour (1-NN) classifier when labels are generated by GSS. Experiments demonstrate that GSS achieves competitive performance compared to baselines that require extensive tuning, making it a strong candidate for real-world, resource-constrained applications.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 16243
Loading