GuidedSampling: Improving Diversity for Training Large Language Models

ACL ARR 2025 May Submission5083 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Repeated Sampling (RS) is a simple yet effective inference-time strategy that has been shown to enhance performance on complex tasks. Although its integration into post-training has achieved pass@k improvements, RS often struggles with generating diverse solution candidates (i.e., lack of exploration of solution space). Due to the lack of diversity, multiple samples are often redundant in their generation since they use the same underlying approach to solve a given problem. To address these limitations, we propose a new inference strategy, \textsc{GuidedSampling}, which decouples the exploration and generation phases at inference time, increasing diversity during sampling. The exploration phase explores multiple concepts that can be utilized to solve the problem, while the generation phase uses a particular concept to give a final solution. Experimental results show that \textsc{GuidedSampling} improves the rate of finding correct solutions by up to $\sim34.6$% over a strong baseline. Furthermore, models trained with trajectories generated via \textsc{GuidedSampling} exhibit substantial performance improvements in pass@10, including $17$% on the MATH, $11.12$% on GPQA-Diamond, and $5.49$% on HumanEval, compared to models trained with traditional RS.
Paper Type: Long
Research Area: Generation
Research Area Keywords: Inference-time algorithm, data diversity, LLMs
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 5083
Loading