Keywords: AI for Science, Large Language Model, Evolutionary Algorithm, Interpretabilty
TL;DR: An LLM assisted evolutionary algorithm to discover interpretable, data-efficient, and generalizable formula for real-world scientific problems.
Abstract: Creating hypotheses for new observations is a key step in the scientific process of understanding a problem in any domain. A good hypothesis that is interpretable, reliable (good at predicting unseen observations), and data-efficient; is useful for scientists aiming to make novel discoveries. This paper introduces an automatic way of learning such interpretable and reliable hypotheses in a data-efficient manner. We propose DiSciPLE (Discovering Scientific Programs using LLMs and Evolution), an evolutionary algorithm that leverages common sense and prior knowledge of large language models (LLMs) to create hypotheses as Python programs. Additionally, we propose two improvements: a program critic and a program simplifier to further improve our method to produce good hypotheses. We evaluate our method on four different real-world tasks in two scientific domains and show significantly better results. For example, we can learn programs with 35% lower error than the closest non-interpretable baseline for population density estimation
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5534
Loading