The Cost of Replicability in Active Learning

TMLR Paper5419 Authors

18 Jul 2025 (modified: 20 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Active learning aims to reduce the number of labeled data points required by machine learning algorithms by selectively querying labels from initially unlabeled data. Ensuring replicability, where an algorithm produces consistent outcomes across different runs, is essential for the reliability of machine learning models but often increases sample complexity. This report investigates the cost of replicability in active learning using two classical disagreement-based methods: the CAL and A\textsuperscript{2} algorithms. Leveraging random thresholding techniques, we propose two replicable active learning algorithms: one for realizable learning of finite hypothesis classes, and another for agnostic. Our theoretical analysis shows that while enforcing replicability increases label complexity, CAL and A\textsuperscript{2} still achieve substantial label savings under this constraint. These findings provide key insights into balancing efficiency and stability in active learning.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: The main changes since the last submission are as follows: 1. Restructured the paper- Moved all proofs and technical details to the appendix, significantly shortening the main paper. 2. Added a Symbols Table to the appendix for ease of understanding. 3. Revised the description of our algorithms to clarify certain algorithmic choices. 4. Expanded the "Conclusion and Future Works" section to address some of the concerns expressed by the reviewers.
Assigned Action Editor: ~Chicheng_Zhang1
Submission Number: 5419
Loading