Discovering Mathematical Formulas from Data via LSTM-guided Monte Carlo Tree Search

23 Sept 2023 (modified: 29 Jan 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Symbolic Regression, Long Short-Term Mem- ory network, Monte Carlo Tree Search, Reinforcement learning.
TL;DR: Act like a scientist and summarize an interpretable, robust formula from natural data to describe the inherent laws of the observed data
Abstract: Finding a concise and interpretable mathematical formula that accurately describes the relationship between each variable and the predicted value in the data is a crucial task in scientific research, as well as a significant challenge in artificial intelligence. This problem is commonly referred to as symbolic regression, which poses an NP-hard combinatorial optimization problem. Traditional symbolic regression algorithms typically rely on genetic algorithms; however, these approaches are sensitive to hyperparameters and often struggle to fully recover the target expression. To address these limitations, a novel symbolic regression algorithm based on Monte Carlo Tree Search (MCTS) was proposed this year. While this algorithm has shown considerable improvement in recovering target expressions compared to previous methods, it still faces challenges when dealing with complex expressions due to the vast search space involved. Moreover, the lack of guidance during the MCTS expansion process severely hampers its search efficiency. In order to overcome these issues, we propose AlphaSymbol - a new symbolic regression algorithm that combines MCTS with a Long Short-Term Memory network (LSTM). By leveraging LSTM's ability to guide the MCTS expansion process effectively, we enhance the overall search efficiency of MCTS significantly. Next, we utilize the MCTS results to further refine the LSTM network, enhancing its capabilities and providing more accurate guidance for the MCTS process. MCTS and LSTM hand in hand advance together, win-win cooperation until the target expression is successfully determined. We conducted extensive evaluations of AlphaSymbol using 222 expressions sourced from over 10 different symbolic regression datasets. The experimental results demonstrate that AlphaSymbol outperforms existing state-of-the-art algorithms in accurately recovering symbolic expressions both with and without added noise.
Supplementary Material: pdf
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6810
Loading