EvoVLM: Multimodal Evolutionary Feedback for Visual Symbolic Regression

Published: 30 May 2026, Last Modified: 30 May 2026ICML2026-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Additional Submission Instructions: For the camera-ready version, please include the author names and affiliations, funding disclosures, and acknowledgements.
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: Visual Symbolic Regression, Explainable AI (XAI), Vision-Language Models, Evolutionary Algorithms
Abstract: While Vision-Language Models (VLMs) have demonstrated remarkable capabilities, their potential for visual reasoning in mathematical discovery remains largely underutilized. To address this, we propose EvoVLM, an automated symbolic regression framework that bridges the gap between visual perception and mathematical formalism. By integrating Pruned Exact Linear Time (PELT) segmentation with small VLMs (sVLMs), EvoVLM visually extracts the structural skeleton of data. To refine the equations, we introduce a Multimodal Evolutionary Feedback loop that leverages concurrent textual $R^2$ metrics and visual overlay plots. Across standard time-series datasets, EvoVLM remains competitive with traditional heuristic methods and, in some cases, outperforms them, notably achieving a higher Best $R^2$ than Auto-ARIMA and PySR on complex dynamics such as the AirPassengers dataset. By using graph images and summary statistics as the primary inputs to the VLM, while retaining numerical arrays for segmentation and fitness evaluation, EvoVLM establishes a data-efficient and explainable pipeline for visual-driven scientific discovery.
Submission Number: 29
Loading