Towards Algorithmic Diversity with Semantic Seed Sampling

ICLR 2026 Conference Submission13796 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, diversity sampling, code generation
TL;DR: self-generated semantic seeds guide exploration, yielding diverse solutions for improved code generation
Abstract: Large language models (LLMs) combined with evolutionary search techniques have achieved remarkable results in challenging open-ended domains such as competitive programming and mathematical discovery. A key ingredient of such methods is solution space exploration, typically performed by sampling a large pool of candidates with high temperature. However, such sampling has been widely critiqued for providing little semantic diversity and introducing syntactic errors in structured domains such as code and math. We propose \textit{semantic seed sampling}, a simple training-free method for controllable exploration. The model first generates a small set of semantically meaningful seeds (short hints or ideas), appends them to the task description, and samples solutions from each seed-conditioned prompt. We observe that semantic seed sampling explores disjoint neighborhoods of the solution space whose combined coverage is substantially larger than that of high-temperature sampling alone. As part of the Best-of-N pipeline, our method yields relative gains of up to 13.8\%, while remaining token-efficient. We provide a theoretical explanation for the near-optimality of small per-seed budgets, supporting it with empirical evidence. These results highlight efficient solution space exploration as an underappreciated and promising direction for improving LLMs' problem-solving abilities.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 13796
Loading