The Optimal Explorer Hypothesis and Its Formulation as a Combinatorial Optimization Problem

Published: 20 Mar 2025, Last Modified: 27 Mar 2025MAEB 2025 ProyectosEveryoneRevisionsBibTeXCC BY 4.0
Supplementary Material: zip
Keywords: agent, open-endedness, reinforcement learning, continual learning, lifelong learning
Abstract: This research project explores the hypothesis that, given a bounded number of steps in an environment, agents that most efficiently optimize their model of the environment are more likely to induce emergent intelligent behavior in a reward-free scenario. We refer to this as the optimal explorer hypothesis. The project aims to formalize and analyze this hypothesis, investigating its theoretical implications and connections to related areas such as open-ended learning and active inference. Building on this foundation, we will develop a practical implementation of an approximate "optimal explorer" agent by formulating it as a combinatorial optimization problem and leveraging established methods from the field. Finally, we will conduct extensive experiments to evaluate whether the proposed agent induces emergent behaviors in diverse and challenging environments.
Submission Number: 8
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview