RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$

Abhinav Bhatia; Samer B. Nashed; Shlomo Zilberstein

RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$

Abhinav Bhatia, Samer B. Nashed, Shlomo Zilberstein

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: meta-reinforcement learning

TL;DR: We propose incorporating action-values, learned online via traditional RL, as inputs to meta-RL and show that this improves earned cumulative reward over longer adaptation periods and generalizes better to out-of-distribution tasks.

Abstract: Meta reinforcement learning (meta-RL) methods such as \rlsquare have emerged as promising approaches for learning data-efficient RL algorithms tailored to a given task distribution. However, they show poor asymptotic performance and struggle with out-of-distribution tasks because they rely on sequence models, such as recurrent neural networks or transformers, to process experiences rather than summarize them using general-purpose RL components such as value functions. In contrast, traditional RL algorithms are data-inefficient as they do not use domain knowledge, but do converge to an optimal policy in the limit. We propose RL$^3$, a principled hybrid approach that incorporates action-values, learned per task via traditional RL, in the inputs to meta-RL. We show that RL$^3$ earns greater cumulative reward in the long term compared to RL$^2$ while drastically reducing meta-training time and generalizes better to out-of-distribution tasks. Experiments are conducted on both custom and benchmark discrete domains from the meta-RL literature that exhibit a range of short-term, long-term, and complex dependencies.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13056

Loading