Keywords: Large Language Models, Conformal Factuality, Local Coverage, Model Aggregation
Abstract: With the growing generative capabilities of large language models (LLMs) in question answering, their practical deployment is hindered by unreliable outputs. Conformal methods have been introduced to control sub-claim factuality with theoretical guarantees.
In particular, Conformal Factuality (CF) offers theoretical marginal guarantees controlling the overall error rate below $\alpha$ via a global filtering threshold. Conditional Conformal (CC), aiming to improve information retention via conditional conformal, learns localized thresholds that optimize the number of retained sub-claims using a user-specific function class. However, the unstable local coverage limits its performance due to the sensitivity to the choice of function class, while significantly driving up the training cost from re-computation of conformal prediction on each gradient update.To address these issues, we propose a lightweight framework that offers Localized Conformal Factuality enhanced by multi-model Aggregation (AggLCF) with rigorous marginal coverage guarantees. By semantically clustering diverse responses from multiple LLMs and extracting structured features, AggLCF learns a localized threshold that empirically achieves $1 - \alpha$ coverage per question while maximizing information retention. Without requiring fine-tuning or using any user-specific function class and re-computation, AggLCF outperforms the previous state-of-the-art in conditional conformal methods, and achieves both marginal and localized coverage on challenging inputs on the MedLFQA benchmark with the highest number of retained valid sub-claims.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 10741
Loading