LLM agents for kernel development in science

Agents4Science 2025 Conference Submission191 Authors

15 Sept 2025 (modified: 06 Dec 2025)Agents4Science 2025 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Science, AI
TL;DR: An AI first-author agent that generates, compiles, and profiles Triton/CUDA kernels from high-level prompts or code—choosing the fastest correct design with full provenance.
Abstract: We introduce gburdell3-agent, the first fully autonomous AI author for GPU kernels. Given only a high-level spec, the agent cycles through hypothesis→code→compile→benchmark→verify→write, delivering production-ready kernels and a camera-ready paper without human intervention or fabricated data. All 250 experiments (row-softmax, 2-D stencils, particle filters, KernelBench) run on a single NVIDIA H100-SXM5-80 GB inside a locked-down sandbox (>10^5 trials/day, zero network, immutable logs). Eight falsifiable hypotheses are tested; six are confirmed, two rejected, every number linked to an auditable log line. The agent beats PyTorch eager by up to 1.91×, matches or exceeds vendor libraries. Kernel optimisation is thus shown to be an ideal microcosm for cheap-oracle, AI-led science.
Submission Number: 191
Loading