Abstract: The rapid growth of vehicle-to-network (V2N) communication demands efficient handover decision-making strategies to ensure seamless connectivity and maximum throughput. However, the dynamic nature of V2N scenarios poses challenges for traditional handover algorithms. To address this, we propose a deep reinforcement learning (DRL)-based approach to optimize handover decisions in dynamic V2N communication. We leverage the advantages of transfer learning and meta-learning to generalize across time-evolving source and target tasks. In this paper, we derive generalization bounds for our DRL-based approach, specifically focusing on optimizing the handover process in V2N communication. The derived bounds provide theoretical guarantees on the expected generalization error of the learned handover time function for the target task. To implement our framework, we propose a meta-learning framework, Adapt-to-evolve (A2E), based on the double deep Q-networks (DDQN) with the Thompson sampling approach. The A2E framework enables quick adaptation to new tasks by minimizing error upper bounds with divergence measures. Through transfer learning, the meta-learner dynamically evolves its handover decision-making strategy to maximize average throughput while reducing the number of handovers. We use Thompson sampling with the DDQN to balance exploration and exploitation. The DDQN with The Thompson sampling approach, ensuring efficient and effective learning, forms the foundation for optimizing the meta-training process, resulting in improvement in cumulated packet loss by 48.02 % in highway settings and 46.32 % in rural settings.
Loading