Keywords: Tensor Programs, Compiler, Code optimization, Cost Model, Auto-tuning
TL;DR: We introduce a new dataset for learned tensor compilers, which can be used to accelerate the search in compilers and benchmark different model designs.
Abstract: Search-based tensor compilers can greatly accelerate the execution of machine learning models by generating high-performance tensor programs, such as matrix multiplications and convolutions. These compilers take a high-level mathematical expression as input and search for the fastest low-level implementations. At the core of the search procedure is a cost model which estimates the performance of different candidates to reduce the frequency of time-consuming on-device measurements. There has been a growing interest in using machine learning techniques to learn a cost model to ease the effort of building an analytical model. However, a standard dataset for pre-training and benchmarking learned cost models is lacking. We introduce TenSet, a large-scale tensor program performance dataset. TenSet contains 52 million program performance records collected from 6 hardware platforms. We provide comprehensive studies on how to learn and evaluate the cost models, including data collection, model architectures, loss functions, transfer learning, and evaluation metrics. We also show that a cost model pre-trained on TenSet can accelerate the search time in the state-of-the-art tensor compiler by up to 10$\times$. The dataset is available at https://github.com/tlc-pack/tenset.
Supplementary Material: zip