Autocomp: LLM-Driven Code Optimization for Tensor Accelerators

Charles Hong; Sahil Bhatia; Alvin Cheung; Sophia Shao

Autocomp: LLM-Driven Code Optimization for Tensor Accelerators

Charles Hong, Sahil Bhatia, Alvin Cheung, Sophia Shao

Published: 21 May 2025, Last Modified: 21 Jun 2025MLArchSys 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Presentation: In-Person

Keywords: Large language models, Hardware accelerators, Compilers

Presenter Full Name: Charles Hong

TL;DR: We build an LLM-based system to optimize code for the tensor accelerator Gemmini, and find that it significantly outperforms even code hand-optimized by experts

Presenter Email: charleshong@berkeley.edu

Abstract: Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today's computing landscape. However, even with significant efforts in building compilers, programming these tensor accelerators remains challenging, leaving much of their potential underutilized. Recently, large language models (LLMs), trained on large amounts of code, have shown significant promise in code generation and optimization tasks, but generating low-resource languages like specialized tensor accelerator code still poses a significant challenge. We tackle this challenge with Autocomp, an approach that empowers accelerator programmers to leverage domain knowledge and hardware feedback to optimize code via an automated LLM-driven search. We accomplish this by: 1) formulating each optimization pass as a structured two-phase prompt, divided into planning and code generation phases, 2) inserting domain knowledge during planning via a concise and adaptable optimization menu, and 3) integrating correctness and performance metrics from hardware as feedback at each search iteration. Across three categories of representative workloads and two different accelerators, we demonstrate that Autocomp-optimized code runs 5.6x (GEMM) and 2.7x (convolution) faster than the vendor-provided library, and outperforms expert-level hand-tuned code by 1.4x (GEMM), 1.1x (convolution), and 1.3x (fine-grained linear algebra). Additionally, we demonstrate that optimization schedules generated from Autocomp can be reused across similar tensor operations, improving speedups by up to 24% under a fixed sample budget.

Presenter Bio: Charles is a rising 4th year PhD student at the University of California, Berkeley, advised by Professor Sophia Shao. He is interested in the intersection between machine learning and computer architecture: both using hardware to accelerate machine learning and using machine learning (particularly LLMs) to accelerate hardware development.

Paper Checklist Guidelines: I certify that all co-authors have validated the presented results and conclusions, and have read and commit to adhering to the Paper Checklist Guidelines, Call for Papers and Publication Ethics.

YouTube Link: https://youtu.be/yfEXMMqy19M

YouTube Link Poster: N/A

Dataset Release: I certify that all co-authors commit to release the dataset and necessary scripts to reproduce the presented results.

Google Slides: https://docs.google.com/presentation/d/1KKy9Qc2kjr5_VQitTNZssmZalRM0MQ4cI2Bf0mlpcvA/edit?usp=sharing

Poster: Yes

Workshop Registration: Yes, the presenter has registered for the workshop.

YouTube Link Short: n/a

Submission Number: 10

Loading