Keywords: math
Abstract: We present a novel approach for controllable mathematical reasoning that leverages
self-optimizing thought vectors with entropy minimization. Our method introduces
learnable thought vectors that dynamically modulate the internal reasoning process
of large language models. Using Gemma-2-9B on GSM8K, we achieve 90.1%
accuracy with a controllability score of 0.42, demonstrating that entropy-based
rewards effectively guide focused reasoning patterns without requiring external
reward annotations. Our analysis reveals distinct thought vector clusters and consis-
tent low-entropy distributions across control conditions, validating our framework
for controllable AI reasoning.
Submission Number: 118
Loading