Keywords: Uncertainty Quantification, Large Language Models, Conformal Prediction
TL;DR: We propose paraphrase-robust conformal prediction for LLMs, achieving valid coverage and compact prediction sets that remain robust under adversarial prompt rewording.
Abstract: Uncertainty quantification (UQ) provides interpretable measures of predictive confidence and supports reliable decision-making with large language models (LLMs). However, existing UQ methods are often neither statistically rigorous nor robust to paraphrase variations. To address these limitations, we propose a new framework for paraphrase-robust UQ, which builds on conformal prediction to ensure valid coverage and introduces a paraphrase-aware nonconformity score to enhance robustness. The score is derived by generating independent semantic paraphrases of each query, training an ancillary model that both approximates and robustifies the predictive distribution, and aggregating variability across these paraphrases. On five general multiple-choice Question Answering (MCQA) datasets and two medical MCQA datasets with $\texttt{Qwen2.5-7B}$, our method achieves nominal coverage with compact prediction sets and demonstrates improved robustness to paraphrase shifts across different rewording settings. The results also generalize to $\texttt{Llama-3.1-8B}$ and $\texttt{Phi-3-small}$, underscoring the reliability of the framework across model families. Code is available at https://anonymous.4open.science/r/paraphrase_uq-FDD8.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 22391
Loading