Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces

Anjiang Wei; Allen Nie; Thiago S. F. X. Teixeira; Rohan Yadav; Wonchan Lee; Ke Wang; Alex Aiken

Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces

Anjiang Wei, Allen Nie, Thiago S. F. X. Teixeira, Rohan Yadav, Wonchan Lee, Ke Wang, Alex Aiken

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Modern scientific discovery increasingly relies on high-performance computing for complex modeling and simulation. A key challenge in improving parallel program performance is efficiently mapping tasks to processors and data to memory, a process dictated by intricate, low-level system code known as *mappers*. Developing high-performance mappers demands days of manual tuning, posing a significant barrier for domain scientists without systems expertise. We introduce a framework that automates mapper development with generative optimization, leveraging richer feedback beyond scalar performance metrics. Our approach features the Agent-System Interface, which includes a Domain-Specific Language (DSL) to abstract away the low-level complexity of system code and define a structured search space, as well as AutoGuide, a mechanism that interprets raw execution output into actionable feedback. Unlike traditional reinforcement learning methods such as OpenTuner, which rely solely on scalar feedback, our method finds superior mappers in far fewer iterations. With just 10 iterations, it outperforms OpenTuner even after 1000 iterations, achieving $3.8\times$ faster performance. Our approach finds mappers that surpass expert-written mappers by up to $1.34\times$ speedup across nine benchmarks while reducing tuning time from days to minutes.

Lay Summary: Can large language models (LLMs) make programs run faster on supercomputers? In this work, we show that they can—by designing a high-level interface that connects an LLM-powered agent with low-level system software. This interface allows LLMs to generate and iteratively refine the high-level programs that control key performance aspects of program execution, without modifying the complex underlying system code. The challenge lies in quickly discovering such effective high-level programs. To address this, we introduce a natural language–based guidance mechanism that interprets execution feedback and helps the LLM improve more efficiently. Our results show that this approach is significantly faster and more effective than traditional reinforcement learning methods. Overall, our work suggests that LLMs could play a major role in solving performance optimization challenges in computer systems.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Primary Area: Deep Learning->Large Language Models

Keywords: Large language model, agent, parallel programming, performance optimization

Submission Number: 2072

Loading