Surge: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors

Surge: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors

ACL ARR 2025 May Submission7627 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Neural surrogate models are powerful and efficient tools in data mining. Meanwhile, large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks. To this end, a novel application of LLMs emerges---using LLMs as surrogate models for code execution prediction. Given LLMs' unique ability to understand and process diverse programs, they present a promising direction for building general-purpose surrogate models. To systematically investigate this capability, we introduce SURGE, a comprehensive benchmark with $1160$ problems covering $8$ key aspects: multi-language programming tasks, competition-level programming problems, repository-level code analysis, high-cost scientific computing, time-complexity-intensive algorithms, buggy code analysis, programs dependent on specific compilers or execution environments, and formal mathematical proof verification. Through extensive analysis of $21$ open-source and proprietary LLMs, we examine scaling laws, data efficiency, and predictive accuracy. Our findings reveal important insights about the feasibility of LLMs as efficient surrogates for computational processes.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Surrogate Model, Large Language Model

Contribution Types: Data resources, Data analysis

Languages Studied: English

Submission Number: 7627

Loading