The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Published: 03 Feb 2026, Last Modified: 02 May 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: State-of-the-art post-training pipelines for reasoning LLMs rely on bootstrapped reasoning loops: they sample many traces, score them, and reinforce the highest-scoring ones, typically by correctness. This can improve accuracy while still collapsing the distribution inside the correct set onto a narrow family of redundant strategies, reducing creative problem-solving. To diagnose this failure mode, we introduce Distributional Creative Reasoning (DCR), a variational framework that casts training as gradient flow on the simplex of reasoning traces. The framework yields three core results. First, a diversity-decay analysis shows that STaR-style rejection fine-tuning and exact mean-field GRPO amplify whichever correct trace is already larger, while DPO regresses pairwise correct-trace ratios toward the reference ratios. Second, it explains why entropy and KL can slow or tether collapse but do not reward semantically distinct correct strategies for being distinct, and how a creativity kernel supplies the missing relational term. Third, under mild conditions, the resulting dynamics converge to a unique, stable, and diverse equilibrium, yielding practical guidance for kernel and hyperparameter design. DCR thus offers a principled route to training reasoning LLMs that remain both correct and creative.
Code Dataset Promise: Yes
Code Dataset Url: https://github.com/maxruizluyten/creative_reasoning_release
Signed Copyright Form: pdf
Format Confirmation: I agree that I have read and followed the formatting instructions for the camera ready version.
Submission Number: 2169
Loading