ENHANCING DATA EFFICIENCY IN REINFORCEMENT LEARNING: A NOVEL IMAGINATION MECHANISM BASED ON MESH INFORMATION PROPAGATION

Zihang Wang; Maowei Jiang; Pengyu Zeng; liruiqi; Quangao Liu; Peter Búš

ENHANCING DATA EFFICIENCY IN REINFORCEMENT LEARNING: A NOVEL IMAGINATION MECHANISM BASED ON MESH INFORMATION PROPAGATION

Zihang Wang, Maowei Jiang, Pengyu Zeng, liruiqi, Quangao Liu, Peter Búš

Published: 22 Oct 2024, Last Modified: 21 Sept 2025NeurIPS 2024 Workshop Open-World Agents PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Data efficiency, Reinforcement learning, Imagination mechanism, Mesh information propagation, Plug-and-Play module

TL;DR: Imagination Mechanism (IM) boosts RL data efficiency by propagating information across episodes, improving learning in SAC, PPO, DDPG, and DQN.

Abstract: Reinforcement learning(RL) algorithms face the challenge of limited data efficiency, particularly when dealing with high-dimensional state spaces and large-scale problems. Most of RL methods often rely solely on state transition information within the same episode when updating the agent’s Critic, which can lead to low data efficiency and sub-optimal training time consumption. Inspired by human-like analogical reasoning abilities, we introduce a novel mesh information propagation mechanism, termed the ’Imagination Mechanism (IM)’, designed to significantly enhance the data efficiency of RL algorithms. Specifically, IM enables information generated by a single sample to be effectively broadcasted to different states across episodes, instead of simply transmitting in the same episode. This capability enhances the model’s comprehension of state interdependencies and facilitates more efficient learning of limited sample information. To promote versatility, we extend the IM to function as a plug-and-play module that can be seamlessly and fluidly integrated into other widely adopted RL algorithms. Our experiments demonstrate that IM consistently boosts four mainstream SOTA RL algorithms, such as SAC, PPO, DDPG, and DQN, by a considerable margin, ultimately leading to superior performance than before across various tasks.

Submission Number: 15

Loading