nnActive: A Framework for Evaluation of Active Learning in 3D Biomedical Segmentation

TMLR Paper4847 Authors

13 May 2025 (modified: 23 May 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Semantic segmentation is crucial for various biomedical applications, yet its reliance on large, annotated datasets presents a significant bottleneck due to the high cost and specialized expertise required for manual labeling. Active Learning (AL) aims to mitigate this challenge by selectively querying the most informative samples, thereby reducing annotation effort. However, in the domain of 3D biomedical imaging, there remains no consensus on whether AL consistently outperforms Random sampling strategies. Current methodological assessment is hindered by the wide-spread occurrence of four pitfalls with respect to AL method evaluation. These are (1) restriction to too few datasets and annotation budgets, (2) training 2D models on 3D images and not incorporating partial annotations, (3) Random baseline not being adapted to the task and (4) measuring annotation cost only in voxels. In this work, we introduce nnActive, an open-source AL framework that systematically overcomes the aforementioned pitfalls by (1) means of a large scale study evaluating 8 QMs on four biomedical imaging datasets and three label regimes, accompanied by four large-scale ablation studies, (2) extending the state-of-the-art 3D medical segmentation method nnU-Net by using partial annotations for training with 3D patch-based query selection, (3) proposing Foreground Aware Random sampling strategies tackling the foreground-background class imbalance commonly encountered in 3D medical images and (4) propose the foreground efficiency metric, which captures that the annotation cost for background- compared to foreground-regions is very low. We reveal the following key findings: (1) while all AL methods outperform standard Random sampling, none reliably surpasses an improved Foreground Aware Random sampling; (2) the benefits of AL dependend on task specific parameters like number of classes and their locations; (3) Predictive Entropy is overall the best performing AL method, but likely requires the most annotation effort; (4) AL performance can be improved with more compute intensive design choices like longer training and smaller query sizes. As a holistic, open source framework nnActive has the potential to act as a catalyst for research and application of AL in 3D biomedical imaging. Code is at: https://anonymous.4open.science/r/nnactive-815F
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=IzoOKomdhV&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: Changes overview from previous submissions: - Full Authorlist - Corrected margins - Removed all wrapfigures, wraptables and landscape format pages
Assigned Action Editor: ~Jose_Dolz1
Submission Number: 4847
Loading