Reducing DNN inference latency using DRL

Suhwan Kim; Sehun Jung; Hyang-Won Lee

Reducing DNN inference latency using DRL

Suhwan Kim, Sehun Jung, Hyang-Won Lee

Published: 01 Jan 2022, Last Modified: 16 May 2025ICTC 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the development of artificial intelligence (AI) technology, many applications are providing AI services. The key part of these AI services is the Deep Neural Networks(DNNs) requiring a lot of computation. However, it is usually time-consuming to provide an inference process on end devices that lack resources. Because of these limitations, distributed computing, which can perform large amounts of calculations using the processing power of various computers connected to the Internet, is emerging. We develop how to efficiently distribute DNN inference jobs in distributed computing environments and quickly process large amounts of DNN computations. In this paper, we will introduce the learning method and the results of the Deep Reinforcement Learning(DRL) model to reduce end-to-end latency by observing the state of the distributed computing environment and scheduling the DNN job using DRL.

Loading