Abstract: With the rapid development of deep learning, training big neural network models demands huge amount of computing power.Therefore, many accelerators are designed to meet the performance requirements. Recently, series of Kunlun chips have been released, which claim comparable performance over GPUs. However, there lacks an end-to-end compiler to support both training and inference on Kunlun chip,leaving large performance optimization space to be explored. This paper presents KunlunTVM, the first end-to-end compiler based on TVM, supporting both training and inference tasks on Kunlun Chip. Experimental results show that KunlunTVM achieves up to 5x training performance improvement over the existing framework PaddlePaddle supporting Kunlun chip. It is noteworthy that the proposed methods are general and extensible for the TVM framework targeting different backends.
0 Replies
Loading