Hawkeye: Hardware-Aware GPU Kernel Optimization with Minimal Supervision

Published: 16 Jun 2026, Last Modified: 16 Jun 2026ICML 2026 Workshop DL4CEveryoneRevisionsBibTeXCC BY-NC 4.0
Keywords: GPU, kernel optimization, porting, hardware aware
TL;DR: Hawkeye scales the test-time compute of coding agents over a minimal taxonomy of unit tests to port hardware-aware GPU kernels across architectures, vendors, and precisions.
Abstract: Achieving peak GPU kernel performance increasingly relies on architecture-specific optimizations targeting new hardware features. While AI coding agents show promise in generating performant kernels, they lack the necessary context to effectively implement and stack hardware-specific optimizations, especially on newer GPU architectures. We propose Hawkeye (Hardware-Aware Kernel Optimization), an open-source framework that grounds autonomous kernel generation in a minimal and comprehensive taxonomy with only one unit test per optimization strategy per target architecture. Supporting a new accelerator therefore requires only 10 expert-written unit tests per architecture (one per recurring optimization strategy) that generalize across downstream workloads, rather than hand writing a new kernel for each workload and precision. Hawkeye effectively scales the test-time compute of coding agents with this minimal expert supervision to enable kernel generation that consistently leverages hardware-specific features, approaching and even surpassing expert-written PyTorch or Triton in BF16 and emerging low precision (FP8, NVFP4, MXFP4) across Ampere, Hopper, Blackwell, and MI350 GPUs. Hawkeye demonstrates that minimally supervised coding agents can exploit architecture-specific hardware features and reduce the overhead of supporting emerging hardware accelerators.
Submission Number: 62
Loading