Hawkeye: Hardware-Aware GPU Kernel Optimization with Minimal Supervision

Arya Tschand; Kesavan Ramakrishnan; Alexander Ingare; Simon Guo; Jeffrey Jian Ma; Zishen Wan; Simran Arora; Azalia Mirhoseini; Vijay Janapa Reddi

Hawkeye: Hardware-Aware GPU Kernel Optimization with Minimal Supervision

Arya Tschand, Kesavan Ramakrishnan, Alexander Ingare, Simon Guo, Jeffrey Jian Ma, Zishen Wan, Simran Arora, Azalia Mirhoseini, Vijay Janapa Reddi

Published: 16 Jun 2026, Last Modified: 16 Jun 2026ICML 2026 Workshop DL4CEveryoneRevisionsBibTeXCC BY-NC 4.0

Keywords: GPU, kernel optimization, porting, hardware aware

TL;DR: Hawkeye scales the test-time compute of coding agents over a minimal taxonomy of unit tests to port hardware-aware GPU kernels across architectures, vendors, and precisions.

Abstract: Achieving peak GPU kernel performance increasingly relies on architecture-specific optimizations targeting new hardware features. While AI coding agents show promise in generating performant kernels, they lack the necessary context to effectively implement and stack hardware-specific optimizations, especially on newer GPU architectures. We propose Hawkeye (Hardware-Aware Kernel Optimization), an open-source framework that grounds autonomous kernel generation in a minimal and comprehensive taxonomy with only one unit test per optimization strategy per target architecture. Supporting a new accelerator therefore requires only 10 expert-written unit tests per architecture (one per recurring optimization strategy) that generalize across downstream workloads, rather than hand writing a new kernel for each workload and precision. Hawkeye effectively scales the test-time compute of coding agents with this minimal expert supervision to enable kernel generation that consistently leverages hardware-specific features, approaching and even surpassing expert-written PyTorch or Triton in BF16 and emerging low precision (FP8, NVFP4, MXFP4) across Ampere, Hopper, Blackwell, and MI350 GPUs. Hawkeye demonstrates that minimally supervised coding agents can exploit architecture-specific hardware features and reduce the overhead of supporting emerging hardware accelerators.

Submission Number: 62

Loading