Keywords: maximum-entropy reinforcement learning, symbolic regression, bayesian methods, uncertainty quantification
TL;DR: We use maximum-entropy reinforcement learning to approximate a posterior over expressions for symbolic regression.
Abstract: Symbolic regression is the problem of finding an algebraic expression describing a stochastic dependence of a target variable on a set of inputs. Unlike forms of regression that fit parameters assuming a fixed model structure, symbolic regression is a search problem over the space of expressions, represented, for example, as abstract syntax trees using a library of operators. Symbolic regression is typically used in settings with limited, noisy data in the natural sciences. However, searching for a single best-fitting expression fails to capture the epistemic uncertainty about the expression, which motivates a Bayesian perspective that enables uncertainty quantification and specification of natural priors to constrain the search space. In this work, we propose ERRLESS (Entropy-Regularised Reinforcement Learning for Expression Structure Sampling), a scalable approach for sampling the posterior distribution over expressions given data using maximum-entropy reinforcement learning. ERRLESS learns a neural policy that constructs expressions sequentially by building up their abstract syntax trees. At convergence, the policy samples expressions from the posterior. At test time, expressions can be sampled by rollouts of this policy. We demonstrate that ERRLESS achieves near state-of-the-art exact symbolic recovery on the AI Feynman benchmark. Beyond exact recovery, we demonstrate that the mean of the posterior predictive approximated by ERRLESS achieves a coefficient of determination ($R^2$) of $0.98$, highlighting the benefits of the Bayesian perspective in symbolic regression.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 13721
Loading