Abstract: We propose an approximate tensor processing unit (APTPU), which includes two main components: (1) approximate processing elements (APEs) consisting of a low-precision multiplier and an approximate adder, and (2) pre-approximate units (PAUs) which are shared among the APEs in the APTPU’s systolic array, functioning as the steering logic to pre-process the operands and feed them to the APEs. We conduct extensive experiments to evaluate the performance of the APTPU across various configurations and various workloads. The results show that the APTPU’s systolic array achieves up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$5.2\times \textit {TOPS}/mm^{2}$ </tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$4.4\times \textit {TOPS}/W$ </tex-math></inline-formula> improvements compared to that of a conventional systolic array design. The comparison between the proposed APTPU and in-house TPU designs shows that we can achieve approximately <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2.5\times $ </tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.2\times $ </tex-math></inline-formula> area and power reduction, respectively, while realizing comparable accuracy. Finally, a comparison with the state-of-the-art approximate systolic arrays shows that the APTPU can realize up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.58\times $ </tex-math></inline-formula> , <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2\times $ </tex-math></inline-formula> , and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.78\times $ </tex-math></inline-formula> , reduction in delay, power, and area, respectively, while using similar design specifications and synthesis constraints.
0 Replies
Loading