Abstract: In an Internet-of-things (IoT) network, incoming
tasks at IoT devices can be executed locally or at a Mobile Edge
Computing (MEC) server. The problem of minimizing the total
discounted delay can be formulated as a Markov Decision Process
(MDP) problem. Since the arrival rate of the incoming tasks is
typically unknown, Reinforcement Learning (RL) methods that
converge to the optimal solution, can be adopted. Q-learning (QL)
and Double Q-learning (DQL) are two popular RL algorithms.
However, the overestimation and the underestimation biases
prevalent in QL and DQL algorithms, respectively, may help
and hurt the learning process, depending on the stochasticities
of various task scheduling strategies. In this paper, we propose
a novel adaptive RL algorithm that switches to the appropriate
Q-learning variant depending on the IoT network’s stochasticity.
Simulation results are presented to evaluate the efficacy of the
proposed algorithm over other state-of-the-art solutions
Loading