Abstract: In an Internet-of-things (IoT) network, incoming tasks at IoT devices can be executed locally or at a Mobile Edge Computing (MEC) server. The problem of minimizing the total discounted delay can be formulated as a Markov Decision Process (MDP) problem. Since the arrival rate of the incoming tasks is typically unknown, Reinforcement Learning (RL) methods that converge to the optimal solution, can be adopted. Q-learning (QL) and Double Q-learning (DQL) are two popular RL algorithms. However, the overestimation and the underestimation biases prevalent in QL and DQL algorithms, respectively, may help and hurt the learning process, depending on the stochasticities of various task scheduling strategies. In this paper, we propose a novel adaptive RL algorithm that switches to the appropriate Q-learning variant depending on the IoT network's stochasticity. Simulation results are presented to evaluate the efficacy of the proposed algorithm over other state-of-the-art solutions.
Loading