Improving Active-Learning Evaluation, with Applications to Protein-Property Prediction

Improving Active-Learning Evaluation, with Applications to Protein-Property Prediction

ICLR 2026 Conference Submission18111 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: active learning, protein property prediction, evaluation, benchmark

Abstract: We highlight that current evaluations of active-learning methods often fail to reflect important aspects of real-world applications, giving an incomplete picture of how methods can behave in practice. Most notably, evaluation problems are commonly constructed from heavily curated datasets, limiting their ability to rigorously stress-test data acquisition: even the worst acquirable data in these datasets is often reasonably useful with respect to the task at hand. To address this we introduce Active Learning on Protein Sequences (ALPS), a set of problems constructed to test key challenges that active-learning methods need to handle in real-world settings. We use ALPS to assess a number of previously successful methods, revealing a number of interesting behaviours and methodological issues. The ALPS codebase serves to support straightforward extensions of our evaluations in future work.

Primary Area: datasets and benchmarks

Submission Number: 18111

Loading