Budget-constrained Active Learning to De-censor Survival Data

Ali Parsaee; Bei Jiang; Russell Greiner

Budget-constrained Active Learning to De-censor Survival Data

Ali Parsaee, Bei Jiang, Russell Greiner

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Active Learning, Survival Analysis, Budgeted Constraints, Bayesian Model, Mutual Information, De-censoring Data

TL;DR: We develop a method for doing a more general form of active learning which accounts for the budget given and the amount of label information we can get, on survival datasets and explore theoretical and experimental results in this domain.

Abstract: Standard supervised learners attempt to learn a model from a labeled dataset. Given a small set of labeled instances, and a pool of unlabeled instances, a budgeted learner can use its given budget to pay to acquire the labels of some unlabeled instances, which it can then use to produce a model. Here, we explore budgeted learning in the context of survival datasets, which include (right) censored instances, where we know only a lower bound c_i on that instance’s time-to-event t_i. Here, that learner can pay to (partially) label a censored instance – eg, to acquire the actual time t_i for an instance [eg, go from (3yr, censor) to (7.2yr, uncensored)], or other variants [eg, learn about 1 more year, so go from (3yr, censor) to either (3.2yr, uncensored) or (4yr, censor)]. This serves as a model of real world data collection, where followup with censored patients does not always lead to complete uncensoring, and how much information is given to the learner model during data collection is a function of the budget and the nature of the data itself. Many fields, such as medicine, finance, and engineering contain survival datasets with a large number of censored instance, and also operate under budget constraints with respect to the learning process, thus making it important to be able to apply this budgeted learning approach. Despite this importance; to our knowledge no other work has looked into doing this. We provide both experimental and theoretical results for how to apply state-of-the-art budgeted learning algorithms to survival data and the respective limitations that exist in doing so. Our approach provides bounds and time complexity theoretically equivalent to standard active learning methods. Moreover, empirical analysis on several survival tasks show that our model performs better than other potential approaches that might be considered on several benchmarks.

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13185

Loading