Abstract: As deep learning empowers various fields, many domain-specific non-neural network operators have been proposed to improve the accuracy of deep learning models. Researchers often use the imperative programming diagram (PyTorch) to express these new operators, leaving the fusion optimization of these operators to deep learning compilers. Unfortunately, the inherent side effects introduced by imperative tensor programs, especially tensor-level mutations, often make optimization extremely difficult. Previous works either fail to eliminate the side effects of tensor-level mutations or require programmers to manually analyze and transform them. In this paper, we present a holistic functionalization approach (TensorSSA) to optimizing imperative tensor programs beyond control flow boundaries. We first introduce TensorSSA intermediate representation for removing tensor-level mutation and expanding the scope and ability of operator fusion. Based on TensorSSA IR, we propose a TensorSSA conversion algorithm that performs functionalization crossing the boundary of control flow. TensorSSA achieves a 1.79X (1.34X on average) speedup in representative deep learning tasks than state-of-the-art works.
Loading