Keywords: RL, L*, planning, rewriting, symbolic systems, value learning, expression simplification
Abstract: Expression simplification is a central task in both mathematics and computer science, with applications ranging from algebraic reasoning to compiler optimization. The successes of reinforcement learning (RL) in various domains have spurred attempts to apply it to symbolic reasoning tasks. However, RL-based methods frequently underperform relative to specialized solutions. This paper theoretically shows that one source of failure might be a poorly designed reward function.
Submission Number: 79
Loading