Keywords: automatic research, graph neural network, language agent
Abstract: Scientific artifacts such as models and benchmarks underpin machine learning research. With the rapid growth of repositories like HuggingFace, researchers now have access to millions of artifacts, yet a key challenge remains: how can we automatically discover the state-of-the-art (SOTA) model for a given benchmark by fully leveraging existing artifacts? We formalize this as automatic SOTA discovery by modeling HuggingFace as an artifact graph, where nodes are models/benchmarks and edges represent evaluations. We propose ArtifactLinker, a two-stage framework: (1) prediction of promising unobserved model–benchmark links using Graph Neural Networks (GNNs) or graph-augmented Large Language Models (LLMs), and (2) verification via fully automatic, reproducible coding experiments with agents. We further introduce ArtifactGraph with 2,977 models and 559 benchmarks to evaluate for both stages. Results show effective graph-based prediction and reliable end-to-end automatic verification of high-performing candidates.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: AI/LLM Agents, NLP Applications, Resources and Evaluation
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Data resources
Languages Studied: English
Submission Number: 8907
Loading