Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO

Published: 10 Jun 2025, Last Modified: 15 Jul 2025MOSS@ICML2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: functional decomposition, polynomial decomposition, beam search, reinforcement learning, symbolic reasoning, transformer model
TL;DR: We study the capabilities of small-scale transformer focusing on the multivariate polynomial decomposition. Our approach includes a novel rank-aware reinforcement learning method called Beam Grouped Relative Policy Optimization (BGRPO).
Abstract: We study the capabilities of small-scale transformer models in symbolic reasoning, focusing on the NP-hard algebraic task of multivariate polynomial decomposition, with widespread applications in science and engineering. Our approach includes a fine-grained synthetic data generation pipeline, supervised pretraining, beam search, evaluations for scaling behavior and generalizability, and a novel rank-aware reinforcement learning method called Beam Grouped Relative Policy Optimization (BGRPO), which improves accuracy while reducing inference compute by up to 75%. Additionally, our model demonstrates competitive performance in polynomial simplification, outperforming Mathematica in various cases.
Code: ipynb
Submission Number: 72
Loading