Keywords: Conformal prediction, large language model, conditional validity
Abstract: Large language models (LLMs) face significant challenges in providing reliable uncertainty quantification for language generation. We introduce a novel conformal prediction framework specifically designed to enhance this reliability through Collaborative Ranking and Dynamic Thresholds. Our method innovatively departs from traditional metrics by harnessing advanced LLM capabilities for comparative judgment, allowing it to rank candidate responses and form a robust, rank-based nonconformity score. This approach enables the construction of prediction sets with rigorous statistical guarantees that inherently adapt to diverse input difficulties and prompt complexities. Extensive experiments across varied question-answering domains consistently demonstrate significant improvements in conditional coverage, delivering precisely calibrated LLM outputs demanding extended reasoning and factual accuracy. We have provided code with implementation details in the repository below: https://anonymous.4open.science/r/512499.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 16167
Loading