Scaling Language Model Reliability via Determinantal Point Process Prompt Sampling

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Language Model Reliability, Prompt Sampling, Determinantal Point Process
Abstract: Language models achieve stronger performance when given multiple opportunities to solve a task, as in best-of-$N$ inference. However, naive approaches to scaling at test time—such as high-temperature sampling or random prompt ensembling—suffer from correlated failures, where many attempts repeat the same mistakes. We argue that improving pass@$k$ performance requires selecting prompts that are individually strong at eliciting correct answers while also nudging the model toward semantically distinct reasoning paths. To this goal, we introduce a lightweight, query-conditioned framework for prompt selection based on Determinantal Point Processes (DPPs). We build an accuracy–diversity target kernel by combining accuracy labels with hidden-activation similarities, and train a small encoder to approximate this target kernel. The encoder is optimized via a Kullback-Leibler divergence objective, which admits an unbiased gradient estimator. Given the compute budget of $k$ generations at inference, the encoder alone is used to generate the test-time DPP and sample a diverse subset of $k$ prompts that maximize coverage of complementary paths. Experiments on multiple benchmarks demonstrate that our approach outperforms competitive baselines.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 24920
Loading