Abstract: Many real-world classification tasks involve datasets with large and imbalanced label spaces,
making class-specific uncertainty quantification particularly challenging.
Conformal Prediction (CP) provides a model-agnostic framework, which formally
guarantees coverage, meaning that its prediction sets contain the true label with
a user-defined probability (confidence level). However, standard class-conditional
methods often fail when data is scarce for some classes. We propose a method
that uses domain knowledge or label hierarchies to dynamically group semantically
related classes to meet the desired coverage for a given confidence threshold.
Our method maintains class-conditioned calibration when possible and provides
group-conditioned guarantees where necessary.
We evaluate our method on outcome diagnoses prediction, an important clinical task
that does not only benefit from robust uncertainty estimation, but also presents a very imbalanced label distribution.
We conduct experiments using three clinical datasets employing two medical taxonomies (ICD-10 and CCSR)
and label spaces of varying sizes with up to more than 1,000 classes.
Our results show that the proposed approach consistently improves class-conditional coverage for infrequent diagnoses,
outperforming strong baselines in all settings in terms of class-conditional coverage. By improving coverage
for underrepresented classes, our method enhances the reliability and trustworthiness of predictive models.
This improvement is especially valuable in clinical applications, where failure to detect rare but serious conditions can lead to
harmful consequences.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=O4jRQGJ7OO
Changes Since Last Submission: - removed unused packages, especially geometry which changed margin size.
Assigned Action Editor: ~Jake_Snell1
Submission Number: 5581
Loading