Meta-Learning and Meta-Reinforcement Learning - Tracing the Path towards DeepMind's Adaptive Agent

Meta-Learning and Meta-Reinforcement Learning - Tracing the Path towards DeepMind's Adaptive Agent

TMLR Paper6536 Authors

17 Nov 2025 (modified: 17 Feb 2026)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Humans are highly effective at utilizing prior knowledge to adapt to novel tasks, a capability that standard machine learning models struggle to replicate due to their reliance on task-specific training. Meta-learning overcomes this limitation by allowing models to acquire transferable knowledge from various tasks, enabling rapid adaptation to new challenges with minimal data. This survey provides a rigorous, task-based formalization of meta‑learning and meta-reinforcement learning and uses that paradigm to chronicle the landmark algorithms that paved the way for DeepMind’s Adaptive Agent, consolidating the essential concepts needed to understand the Adaptive Agent and other generalist approaches.

Submission Type: Long submission (more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=TG1QDSqTP1

Changes Since Last Submission: We thank all reviewers for their valuable feedback. After carefully revising our manuscript, we have implemented the following changes: 1. We slightly changed formulations in the introduction, so that ADA is not the culmination of everything, but rather the most famous representative of the family of generalist agents. 2. We carefully revised Example 1 and 2, sharpening the focus on illustrating concepts rather than technical mathematical details. Thereby, we slightly changed our introduction of the task notion (i.e., the explanation of Equ. 1). 3. The biggest changes were in the section discussing performance measures. We made it a dedicated subsection (Section 2.3) and extended the different paragraphs with further details and short examples. Thereby, RL is also discussed, which led to an additional paragraph in the discussion of the "generalization" notion. Beyond that, the most significant changes are in the paragraph about adaptation speed. The introduction to this subsection also changed so that it has references and cross-references, now. 4. We carefully revised Section 3 to get rid of the redundancies, particularly across the RL^2 (3.2), VariBAD (3.3) and ADA (3.5) sections. Across all subsections, we deleted the "Paradigm" headline, since it was one of the main reasons for redundancies. Additionally, we shortened Section 3.1, particularly its introduction. 5. In Section 3, we carefully revised the Performance Analysis with a special focus on the performance measures in Section 2.3. We went through most of the related literature in order to serve the reader with more precise referencing and performance measures. Please note, however, that we needed to derive the presented quantitative measures from Figures whose scales provide information only up to a certain extend. As we already stated many times, this makes a precise quantitative performance analysis nearly impossible. We are, nevertheless, convinced that our revision of the Performance Analysis sections significantly increased their contents and readability. 6. We went through our whole manuscript, again, to further improve readability and style. 7. In order to reduce redundancy, we merged Table 1 into Table 2.

Assigned Action Editor: ~Tim_Genewein1

Submission Number: 6536

Loading