Concurrent Reinforcement Learning with Aggregated States via Randomized Least Squares Value Iteration

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We prove worst-case regret bound for concurrent RLSVI, reducing the space complexity by a factor of $K$ with only a $\sqrt{K}$ regret increase compared to the single-agent version of \citep{agrawal2021improved,russo2019worst}
Abstract: Designing learning agents that explore efficiently in a complex environment has been widely recognized as a fundamental challenge in reinforcement learning. While a number of works have demonstrated the effectiveness of techniques based on randomized value functions on a single agent, it remains unclear, from a theoretical point of view, whether injecting randomization can help a society of agents concurently explore an environment. The theoretical results established in this work tender an affirmative answer to this question. We adapt the concurrent learning framework to randomized least-squares value iteration (RLSVI) with aggregated state representation. We demonstrate polynomial worst-case regret bounds in both finite- and infinite-horizon environments. In both setups the per-agent regret decreases at an optimal rate of $\Theta\left(\frac{1}{\sqrt{N}}\right)$, highlighting the advantage of concurent learning. Our algorithm exhibits significantly lower space complexity compared to Russo (2019) and Agrawal et. al (2021). We reduce the space complexity by a factor of $K$ while incurring only a $\sqrt{K}$ increase in the worst-case regret bound, compared to Russo (2019) and Agrawal et. al (2021). Interestingly, our algorithm improves the worst-case regret bound of Russo (2019) by a factor of $H^{1/2}$, matching the improvement in Agrawal et. al (2021). However, this result is achieved through a fundamentally different algorithmic enhancement and proof technique. Additionally, we conduct numerical experiments to demonstrate our theoretical findings.
Lay Summary: Reinforcement learning (RL) trains computer programs (agents) to make optimal decisions through trial-and-error interactions with complex environments. One significant challenge is helping multiple agents explore efficiently when working together, as current methods mainly address single-agent scenarios. In our study, we developed a new approach that helps groups of agents collectively learn faster and more efficiently. We adapted a method called concurrent randomized least-squares value iteration (RLSVI) with aggregated states, allowing multiple agents to explore concurrently by sharing and simplifying their knowledge of the environment. Our theoretical results demonstrate that each agent learns faster as more agents participate, significantly speeding up overall learning. Additionally, our method greatly reduces the memory requirements compared to existing work, making it practical for systems with limited resources. Compared to previous research, our approach achieves better performance through simpler computations and innovative theoretical insights. Through experiments, we confirmed that our theory translates into practical performance improvements. This work advances our understanding of multi-agent learning, providing a scalable solution for real-world applications.
Link To Code: https://github.com/yz2/rlsvi_code
Primary Area: Reinforcement Learning->Multi-agent
Keywords: worst-case regret bound, RL theory, randomized least squares value iteration, multi-agent learning, aggregated states, concurrent learning
Submission Number: 8213
Loading