Exploiting bias in reinforcement learning for task allocation in a mobile edge computing system

Parth Hiren Shah, Satyadev Badireddi, Raunakk Banerjee, Arghyadip Roy

Published: 01 Jul 2024, Last Modified: 13 May 20262024 International Conference on Signal Processing and Communications (SPCOM)EveryoneWM2024 Conference

Abstract: In an Internet-of-things (IoT) network, incoming tasks at IoT devices can be executed locally or at a Mobile Edge Computing (MEC) server. The problem of minimizing the total discounted delay can be formulated as a Markov Decision Process (MDP) problem. Since the arrival rate of the incoming tasks is typically unknown, Reinforcement Learning (RL) methods that converge to the optimal solution, can be adopted. Q-learning (QL) and Double Q-learning (DQL) are two popular RL algorithms. However, the overestimation and the underestimation biases prevalent in QL and DQL algorithms, respectively, may help and hurt the learning process, depending on the stochasticities of various task scheduling strategies. In this paper, we propose a novel adaptive RL algorithm that switches to the appropriate Q-learning variant depending on the IoT network’s stochasticity. Simulation results are presented to evaluate the efficacy of the proposed algorithm over other state-of-the-art solutions