Percentile-Based Deep Reinforcement Learning and Reward Based Personalization For Delay Aware RAN Slicing in O-RAN

Peyman Tehrani; Anas Alsoliman

Percentile-Based Deep Reinforcement Learning and Reward Based Personalization For Delay Aware RAN Slicing in O-RAN

Peyman Tehrani, Anas Alsoliman

Published: 06 Jun 2025, Last Modified: 28 Jun 2025ICML Workshop on ML4WirelessEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep reinforcement learning, Federated learning, Personalization, RAN slicing, Wireless networks, Reward function, Resource allocation, Probabilistic constraint, Percentile, Quality of Service.

TL;DR: This paper proposes percentile-based DRL solution for RAN slicing that meeting the probabilistic user's delay upper bounds, and introduces a reward-based personalization technique which outperforms the traditional approaches like federated averaging.

Abstract: In this paper, we tackle the challenge of radio access network (RAN) slicing within an open RAN (O-RAN) architecture. Our focus centers on a network that includes multiple mobile virtual network operators (MVNOs) competing for physical resource blocks (PRBs) with the goal of meeting probabilistic delay upper bound constraints for their clients while minimizing PRB utilization. Initially, we derive a reward function based on the law of large numbers (LLN), then implement practical modifications to adapt it for real-world experimental scenarios. We then propose our solution, the Percentile-based Delay-Aware Deep Reinforcement Learning (PDA-DRL), which demonstrates its superiority over several baselines, including DRL models optimized for average delay constraints, by achieving a 38\% reduction in resultant average delay. Furthermore, we delve into the issue of model weight sharing among multiple MVNOs to develop a robust personalized model. We introduce a reward-based personalization method where each agent prioritizes other agents' model weights based on their performance. This technique surpasses traditional aggregation methods, such as federated averaging, and strategies reliant on traffic patterns and model weight distance similarities.

Submission Number: 4

Loading