Efficient Multilabel Uncertainty Quantification with Conformal Ensembles

ICLR 2026 Conference Submission17776 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multilabel classification, Conformal prediction, Uncertainty estimation, Ensembles, Calibration, Reliable AI
Abstract: Multilabel classification (MLC) is challenging due to labels being often correlated and due to the highly complex decision boundaries. Moreover, uncertainty quantification, which helps address sparse label combinations, remains an area with significant room for further exploration. In many high-stakes domains, reliable predictions must not only be accurate but also quantify uncertainty to avoid missing critical cases. Conformal prediction (CP) offers distribution-free coverage guarantees, but when applied to individual models it can produce unstable or overly large prediction sets. Ensemble methods are a well-established approach to improve stability and efficiency, yet their potential in multilabel settings has not been fully explored. We investigate ensemble conformal prediction for multilabel classification. Building on prior work on voting- and score-based ensembles, we adapt these strategies to the label-wise multilabel setting and conduct a systematic empirical study across multiple aggregation schemes: (i) majority voting, (ii) calibrated aggregation of nonconformity scores, and (iii) performance-weighted aggregation. The theoretical perspective frames independence assumptions and voting bounds in the multilabel ensemble setting, clarifying how coverage guarantees extend under majority voting. Across standard MLC benchmarks (COCO, Yeast, Emotions), our ensembles consistently improve over single-model CP yielding more efficient prediction sets (smaller and more informative), while maintaining target coverage and achieving higher macro-F1 scores. We provide a systematic study of ensemble aggregation methods for conformal prediction in multilabel classification, combining theoretical perspective with a broad comparative evaluation.
Supplementary Material: zip
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 17776
Loading