The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving
Abstract: State-of-the-art large language model (LLM) pipelines rely on bootstrapped reasoning loops—sampling diverse chains of thought and reinforcing the highest-scoring ones—primarily optimizing a scalar reward such as correctness. We analyze how this design choice is sensitive to the collapse of the model’s distribution over reasoning paths, slashing semantic entropy and undermining creative problem-solving. To diagnose this failure, we introduce Distributional Creative Reasoning (DCR), a unified variational objective that casts training as gradient flow in the space of probability measures on solution traces. STaR, GRPO, and DPO, as well as entropy bonuses, novelty search, and quality–diversity objectives, all emerge as special cases of the same loss. The framework delivers three core results: (i) a diversity decay theorem detailing how scalar-only objectives lead to distinct modes of diversity collapse for STaR, GRPO, and DPO; (ii) diversity-enhancing designs that, by sufficiently incorporating our DCR functional, ensure convergence to a unique, stable, and diverse policy, effectively counteracting collapse; and (iii) simple, actionable recipes to achieve this in practice. DCR thus offers the first principled recipe for LLMs that remain both correct \emph{and} creative.
Submission Number: 2169
Loading