A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning

Jinming Ma; Xiuhong Li; Zihan Wang; Xingcheng Zhang; Shengen Yan; Yuting Chen; Yueqian Zhang; Minxi Jin; Lijuan Jiang; Yun Liang; Chao Yang; Dahua Lin

A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning

Jinming Ma, Xiuhong Li, Zihan Wang, Xingcheng Zhang, Shengen Yan, Yuting Chen, Yueqian Zhang, Minxi Jin, Lijuan Jiang, Yun Liang, Chao Yang, Dahua Lin

Published: 01 Jan 2024, Last Modified: 14 Nov 2024DAC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: As deep learning empowers various fields, many domain-specific non-neural network operators have been proposed to improve the accuracy of deep learning models. Researchers often use the imperative programming diagram (PyTorch) to express these new operators, leaving the fusion optimization of these operators to deep learning compilers. Unfortunately, the inherent side effects introduced by imperative tensor programs, especially tensor-level mutations, often make optimization extremely difficult. Previous works either fail to eliminate the side effects of tensor-level mutations or require programmers to manually analyze and transform them. In this paper, we present a holistic functionalization approach (TensorSSA) to optimizing imperative tensor programs beyond control flow boundaries. We first introduce TensorSSA intermediate representation for removing tensor-level mutation and expanding the scope and ability of operator fusion. Based on TensorSSA IR, we propose a TensorSSA conversion algorithm that performs functionalization crossing the boundary of control flow. TensorSSA achieves a 1.79X (1.34X on average) speedup in representative deep learning tasks than state-of-the-art works.

Loading