Federated Offline Reinforcement Learning With Multimodal Data

Published: 01 Jan 2024, Last Modified: 13 May 2025IEEE Trans. Consumer Electron. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The Tactile Internet (TI) allows operators to have an immersive experience in a remote environment. During this process, users generate a large amount of demonstration data containing tactile information. It is important to reasonably use user-generated data to improve the intelligence of Tactile Internet applications without infringing on user privacy. In order to use only user-generated datasets for learning without expensive environment interaction, conservative policy estimation in offline reinforcement learning is introduced in this brief to ensure the convergence of reinforcement learning algorithms. In addition, the dataset composed of different user behavior data has the characteristics of multimodal distribution, where the same state corresponds to different actions. The offline reinforcement learning algorithm is used to reconstruct and learn the user’s behavior under the framework of federated learning, and the diffusion model is introduced to model the multimodal distribution caused by different user preference. Based on this, we propose a federated diffusion Q-learning (FDQL) algorithm and verify the effectiveness of the algorithm in the d4rl dataset. Experimental results demonstrate that the FDQL algorithm performs efficiently within the federated learning framework, effectively capturing users’ multimodal behaviors and achieving state-of-the-art results.
Loading