README: Rapid Equation Discovery with Multimodal Encoders

Gregory Kang Ruey Lau; Yue Ran Kang; Zi-Yu Khoo; Apivich Hemachandra; Ruth Wan Theng Chew; Bryan Kian Hsiang Low

README: Rapid Equation Discovery with Multimodal Encoders

Gregory Kang Ruey Lau, Yue Ran Kang, Zi-Yu Khoo, Apivich Hemachandra, Ruth Wan Theng Chew, Bryan Kian Hsiang Low

Published: 09 Jul 2025, Last Modified: 25 Jul 2025AI4Math@ICML25 PosterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

Keywords: equation discovery, multimodal LLMs

TL;DR: README rapidly discovers interpretable symbolic equations of observed data through novel and efficient image representation, multimodal encoders, and a novel latent space optimization algorithm.

Abstract: Discovering scientific laws or interpretable symbolic equations from data rapidly is important in many setting, such as decision-making in time-sensitive high-stake scenarios or applications involving interactive or iterative experimentation such as in scientific or machine learning workflows. However, existing methods, generally known as symbolic regression (SR), typically require long computational time to achieve good performance and have to run from scratch for each dataset. Recent methods that use pre-training SR foundation models for faster inference also suffer from performance limitations and require large training datasets. In this work, we propose README, a framework for rapid equation discovery that can generate performant, interpretable equations from limited, noisy data in just a few seconds, and requires significantly less training data compared to past SR foundation model approaches. We achieve this by being the first to (1) work with image representations of datasets to efficiently capture their key properties, (2) combine the capabilities of open-sourced pre-trained text and image encoders to produce an informative SR embedding space, and (3) develop a novel Grey Wolf Optimizer with Bayesian Optimization (GWOBO) algorithm to rapidly optimize for the best symbolic expression within seconds. We empirically show that README outperforms benchmarks on a wide range of realistic datasets, including real experimental data from various domains and noisy video-extracted dynamics.

Submission Number: 162

Loading