Keywords: Query-efficient learning, Hallucination mitigation, Projected stochastic gradient descent (PSGD), Active learning / membership queries
Abstract: Hallucinations—where generative models produce invalid or nonsensical outputs—remain a critical challenge for reliable deployment. We present the first computationally and query-efficient algorithm that provably addresses the hallucination problem by actively querying the model’s own invalid outputs. Specifically, we impose a strict constraint on the hallucination rate while maximizing the likelihood of valid target examples via projected stochastic gradient descent. Our method works in very general settings with arbitrary distributions parameterized by sufficiently expressive exponential families. Our approach is enabled by a novel connection to the field of truncated statistics and settles an open problem posed by Hanneke et al.~\yrcite{pmlr-v75-hanneke18a}.
Primary Area: learning theory
Submission Number: 19980
Loading