Confirmation: our paper adheres to reproducibility best practices. In particular, we confirm that all important details required to reproduce results are described in the paper,, the authors agree to the paper being made available online through OpenReview under a CC-BY 4.0 license (https://creativecommons.org/licenses/by/4.0/), and, the authors have read and commit to adhering to the AutoML 2025 Code of Conduct (https://2025.automl.cc/code-of-conduct/).
TL;DR: A novel multi-agent framework, "PiML: Automated Machine Learning Workflow Optimization using LLM Agents" for exploring ML problem solving via iterative refinement and systematic planning
Abstract: In this paper, we introduce PiML, a novel automated pipeline specifically designed for solving real-world machine learning (ML) tasks such as Kaggle competitions. PiML integrates iterative reasoning, automated code generation, adaptive memory construction, and systematic debugging to tackle complex problems effectively. To rigorously assess our framework, we selected 26 diverse competitions from the MLE-Bench benchmark, ensuring comprehensive representation across various complexity levels, modalities, competition types, and dataset sizes. We quantitatively compared PiML's performance to AIDE—the best-performing existing baseline from MLE-Bench—across multiple evaluation metrics: Valid Submission rate, Submissions Above Median, Average Percentile Rank, and Medal Achievement Rate. Using the "o3-mini" model, PiML surpassed the baseline in submissions above median (34.61\% vs 30.77\%), medal attainment rate (26.92\% vs 23.08\%), and average percentile rank (43.75\% vs 39.06\%). These results highlight PiML’s flexibility, robustness, and superior performance on practical and complex ML challenges.
Submission Number: 47
Loading