Abstract: With the growing importance of deploying deep neural networks (DNNs), there are increasing demands to improve both the efficiency and quality of tensor program optimization (TPO). TPO involves searching for possible program transformations for a given tensor program on target hardware to optimize its execution. TPO is challenging and expensive due to the exponential combinations of transformations and time-consuming on-device measurement of transformations. While prior research has primarily focused on the quality of TPO, i.e., generating high-performance tensor programs, there has been less emphasis on the efficiency of TPO, i.e., optimizing tensor programs with low optimization time overhead.In this paper, we address the primary inefficiencies in current TPO approaches, especially the extensive time required for on-device measurement and the inefficiency in the search process, and aim to reduce the optimization time for DNNs. To this end, we propose a machine learning-based, end-to-end TPO framework named Fasor. Fasor includes three key design components: 1): a transferable cost model with high transferring efficiency to reduce the on-device measurement time significantly, 2): a search space shrinking module to prune program transformations with low optimization potential, and 3): a two-stage fast exploration module to enhance searching efficiency substantially. Experimental results show that Fasor achieves the best of both worlds in TPO quality and efficiency compared to state-of-the-art TPO frameworks for CPUs and GPUs, contributing to efficient and scalable DNN deployment.
Loading