% \vspace{-2ex}
\section{Related Work}
%\hooman{we have a in-detail background section, so i think maybe related work should go at the end?}\devjeet{We might still end up needing a related work since the background is now super compressed, compared to before}
%\vspace{1ex}

\noindent {\bf Conformal prediction.} In recent years, CP has emerged as an effective framework for uncertainty quantification across both regression~\cite{romano2019conformalized,papadopoulos2002inductive} and classification tasks~\cite{romano2020classification,angelopoulos2020uncertainty,sadinle2019least}. Much of the recent focus in the CP community has been on reducing set sizes~\cite{romano2020classification,angelopoulos-sets,sadinle2019least} and constructing CP procedures for specific usage contexts~\cite{huang2024uncertainty}. Some works also focus on providing conditional coverage guarantees~\cite{Gibbs2023-ax,jung2022batch,tibshirani2019conformal}. Conformal risk control extends CP to control generalized risk on prediction sets~\cite{angelopoulos2022conformal,bates2021distribution}. The Learn-Then-Test \citet{angelopoulos2021learn} is a framework for risk controlled prediction sets that does not require monotonic risks. It is utilized by CLM~\citet{Quach2023-mq} for generating prediction sets from LLMs.

%\vspace{1ex}


% \citet{wang2022probabilistic} propose to construct discontinuous prediction sets for conditional generative models with continuous output spaces based on a distance metric in the output space. Their method also relies on the sampling distribution of the model, but does not work for discrete spaces which is the focus of our work. .

\noindent {\bf Uncertainty estimation for LLMs.} Uncertainty for LLMs has largely focused on calibrating output probabilities~\cite{jiang2021can,desai2020calibration,lin2023generating,kuhn2023semantic}, verbalization~\cite{kadavath2022language}, and Bayesian approaches~\cite{malinin2020uncertainty, ryabinin2021scaling}. Deep classifiers can exhibit high levels of calibration error~\cite{jiang2021can}. CP is recently utilized for generative models, especially LLMs, in settings with bounded output spaces, such as multiple choice question answering~\cite{kumar2023conformal,zhang2021less,rouzrokh2024conflare,li2024traq}. Lastly, another line of work focuses on selecting a subset of the generated output of an LLM with factuality guarantees~\cite{mohri2024language,cherian2024large}. 
% The recent adoption of post training alignment procedures, such as reinforcement learning from human feedback~\cite{christiano2017deep}, can further exacerbate this issue~\cite{achiam2023gpt}. 
% Kuhn et al.,~\cite{kuhn2023semantic} posit that, even when properly calibrated, surface form competition~\cite{holtzman2021surface} can lead to diffused likelihoods amongst semantically equivalent forms, and propose to cluster these forms to produce more accurate output probabilities. 
% Another line of work focuses on Bayesian uncertainty estimation~\cite{malinin2020uncertainty, ryabinin2021scaling}.


% \vspace{-2ex}
\section{Summary}

This paper introduced \methodname, a novel conformal prediction algorithm to produce marginally valid prediction sets for deep generative models with unbounded output spaces. %We calibrate a stopping rule by modeling the number of samples until an admissible solution is reached as a geometric distribution. This allows us to estimate conditional quantiles in parametric form without requiring any distributional assumptions on the underlying model. 
Our score formulation allows \methodname\ to selectively abstain on specific examples, allowing it to achieve lower abstention rates and higher non-abstention coverage at $\alpha$-levels at which prior work always abstains. %Moreover, \methodname\ predicts a stopping rule apriori, without the need for iterative sampling, making it suitable for modern inference pipelines. 
Our experiments show that \methodname\ indeed achieves higher non-abstention coverage and lower abstention rates, while maintaining parity with CLM with regard to set sizes at higher $\alpha$ levels. %Future work includes extending CP to settings that involve dependent sampling, including multi-turn inference, and agentic problem solving.