ATFormer: A Learned Performance Model with Transfer Learning Across Devices for Deep Learning Tensor Programs

Yang Bai, Wenqian Zhao, Shuo Yin, Zixiao Wang, Bei Yu

Published: 2023, Last Modified: 23 Jan 2026EMNLP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The training and inference efficiency of ever-larger deep neural networks highly rely on the performance of tensor operators on specific hardware platforms. Therefore, a compilation-based optimization flow with automatic tensor generation and parameter tuning is necessary for efficient model deployment. While compilation-based methods with performance models can provide dynamic and suitable code optimization, they suffer from a large design space exploration with rough measurement accuracy and poor transferability among different hardware platforms. This paper presents ATFormer, a simple yet efficient design with attention-inspired modules to accurately predict the performance of optimized operators by capturing global and long-range dependencies within a complete scheduling space. Compared with state-of-the-arts, ATFormer can predict the optimal implementation of tensor operators to reduce inference time with minimal effort on modern DNN benchmarks. Furthermore, ATFormer with pre-trained parameters can quickly adapt to different workloads and hardware via transfer learning.