Multivariate Conformal Selection

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
TL;DR: We propose Multivariate Conformal Selection (mCS), a framework extending Conformal Selection to multivariate settings, achieving finite-sample False Discovery Rate control and enhanced selection power with applications in drug discovery and beyond.
Abstract: Selecting high-quality candidates from large datasets is critical in applications such as drug discovery, precision medicine, and alignment of large language models (LLMs). While Conformal Selection (CS) provides rigorous uncertainty quantification, it is limited to univariate responses and scalar criteria. To address this, we propose Multivariate Conformal Selection (mCS), a generalization of CS designed for multivariate response settings. Our method introduces regional monotonicity and employs multivariate nonconformity scores to construct conformal $p$-values, enabling finite-sample False Discovery Rate (FDR) control. We present two variants: $\texttt{mCS-dist}$, using distance-based scores, and $\texttt{mCS-learn}$, which learns optimal scores via differentiable optimization. Experiments on simulated and real-world datasets demonstrate that mCS significantly improves selection power while maintaining FDR control, establishing it as a robust framework for multivariate selection tasks.
Lay Summary: We've developed a new statistical method called Multivariate Conformal Selection (mCS) to help identify valuable candidates from large datasets, especially when dealing with multiple factors at once. Think of it like sifting through thousands of potential new drugs to find the best ones based on several desirable properties, not just one. Traditional methods, like Conformal Selection (CS), work well for single-factor decisions but struggle with complex, multi-factor choices. Our mCS method extends this by introducing a new way to measure how "unusual" a candidate is, allowing us to control the rate of false discoveries – meaning, we minimize selecting duds. We offer two versions: $\texttt{mCS-dist}$, which uses straightforward distance calculations, and $\texttt{mCS-learn}$, which uses machine learning to find the best way to evaluate candidates. Both versions significantly improve our ability to pick the right candidates while keeping errors low. This makes mCS a powerful tool for important tasks like drug discovery and refining large AI models.
Link To Code: https://github.com/Tian-Bai/mcs
Primary Area: General Machine Learning->Supervised Learning
Keywords: Uncertainty Quantification, Machine Learning, Model-free Selective Inference, Conformal Inference, Multivariate Responses
Submission Number: 7665
Loading