Hierarchical Reinforcement Learning for Sparse-Reward Search in Commutative Algebra

Giorgi Butbaia; Paul Orland; Coco Huang; Davide Passaro; Lucas Fagan; Michele Tarquini; Hailong Dao; David Eisenbud; Ali Shehper; Sergei Gukov

Hierarchical Reinforcement Learning for Sparse-Reward Search in Commutative Algebra

Giorgi Butbaia, Paul Orland, Coco Huang, Davide Passaro, Lucas Fagan, Michele Tarquini, Hailong Dao, David Eisenbud, Ali Shehper, Sergei Gukov

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We show that explicitly learning temporal abstractions via constrained options-based HRL framework enables sparse-reward search in commutative algebra.

Abstract: Applying machine learning techniques to solving long-standing mathematical conjectures can be particularly challenging due to their extreme reward sparsity. As an illustrative example, we consider Kalai's algebraic Hirsch conjecture and recast the construction of its counterexamples as a sparse-reward reinforcement learning problem on graphs. We propose a constrained options-based HRL framework with an equivariant graph neural network policy, which allows us to learn useful temporal abstractions for this task. We evaluate our approach over a wide range of degrees and demonstrate that it consistently outperforms classical RL algorithms as well as greedy search. By exploiting the hierarchical structure of the problem, we effectively provide a first-of-its-kind application of HRL to a problem in commutative algebra.

Lay Summary: Some mathematical problems are hard for reinforcement learning because of a combination of episodes being very long, successful episodes being extremely rare and there being almost no feedback until the end of the episode. In this paper, we study one such problem from commutative algebra, related to Kalai’s formulation of the algebraic Hirsch conjecture. We formulate the search for counterexamples to the conjecture as a reinforcement learning task on graphs, where the goal is to build special algebraic objects called linear monomial ideals with large diameter. We successfully construct counterexamples using a hierarchical strategy: the algorithm first learns to build useful intermediate structures, which we call spines, then it modifies them into full solutions. We show that this approach, combined with our custom structure-aware graph neural network that captures the key algebraic properties associated with the ideal, significantly outperforms standard reinforcement learning methods and greedy search. This provides a first example of hierarchical reinforcement learning successfully applied to a needle-in-a-haystack search problem in commutative algebra.

Link To Code: https://github.com/Math-AI-Caltech/alghirsch-hrl

Primary Area: Reinforcement Learning

Keywords: Hierarchical Reinforcement Learning, Sparse Rewards, Constrained Policies, Commutative Algebra, AI for Math, Mathematical Reasoning

Originally Submitted PDF: pdf

Submission Number: 13026

Loading