Keywords: LLM Agent
TL;DR: We introduce MLE-Live, a live framework for evaluating ML agents in community-driven settings, and propose CoMind, a state-of-the-art agent that collaborates and competes like a real Kaggle participant.
Abstract: Large language model-based machine learning (ML) agents have shown great promise in automating ML research.
However, existing agents typically operate in isolation on a given research problem, without engaging with the broader research community, where human researchers often gain insights and contribute by sharing knowledge.
To bridge this gap, we introduce MLE-Live, a live evaluation framework designed to assess an agent's ability to communicate with and leverage collective knowledge from a simulated Kaggle research community.
Building on this framework, we propose CoMind, a novel agent that excels at exchanging insights and developing novel solutions within a community context.
CoMind achieves state-of-the-art performance on MLE-Live and outperforms 79.2% human competitors on average across four ongoing Kaggle competitions.
Supplementary Material: zip
Submission Number: 195
Loading