Abstract: This paper introduces two novel learning schemes for distributed agents in continuous action-based Reinforcement Learning (RL) environments: Reward-Weighted (R-Weighted) and Loss-Weighted (L-Weighted) gradient merger. Traditional methods aggregate gradients through simple summation or averaging, which may not effectively capture the diverse learning strategies of agents operating in different environments.
This aggregation can lead to suboptimal updates by diluting the influence of more informative gradients.
To address this, our proposed methods adjust the gradients of each agent based on its episodic performance, scaling by episodic reward (R-Weighted) or episodic loss (L-Weighted).
By giving more weight to gradients from more successful or informative episodes, these methods aim to prioritize the most relevant learning signals, enhancing overall training efficiency.
Each agent operates with identical Neural Network parameters but within differently initialized versions of the same environment, resulting in distinct gradients from each actor.
By weighting the gradients according to their rewards or losses, we enable agents to share their learning potential, focusing on environments with richer or more critical information.
We empirically demonstrate that the L-Weighted method outperforms state-of-the-art approaches in various RL environments, including CartPole, LunarLander, HumanoidStandup, and Half-Cheetah, with an average of 13.84% higher cumulative reward.
The R-Weighted approach performs similarly to state-of-the-art methods, with a minor improvement of 2.33% higher cumulative reward.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=Wa7o1Oeru1
Changes Since Last Submission: Changed latex engines to one that can adhere to the TMLR text style.
And fixed scaling of figures based on width.
Assigned Action Editor: ~Sai_Aparna_Aketi1
Submission Number: 3243
Loading