TensorCV: Accelerating Inference-Adjacent Computation Using Tensor Processors

Dongho Ha, Won Woo Ro, Hung-Wei Tseng

Published: 2023, Last Modified: 15 May 2025ISLPED 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The advancements in AI/ML accelerators have made the core AI/ML computation relatively insignificant in application pipelines. For example, inferencing only accounts for 3% of the latency in an image-based ML pipeline with the help of Tensor Cores. The mismatch in performance growth between ML model computation and ML-adjacent computation, the producer and consumer of ML models, will become the bottleneck leading to system inefficiency. This paper presents a set of innovative algorithms to allow the entire ML-based computer vision pipelines to leverage AI/ML accelerators. Our proposed algorithms feature matrix-based operations that AI/ML accelerators specialize in. Simply compiler optimizations cannot take full advantage of hardware acceleration without revisiting algorithms. This paper implements the proposed algorithms as an open-source library, TensorCV, in a system platform with Tensor Cores. TensorCV shows a 6.12 × speedup in optimized ML-adjacent functions and saves 81 % energy consumption on modern heterogeneous computers. The code is available at https://github.com/escalab/TensorCV.