The Optimal Explorer Hypothesis and Its Formulation as a Combinatorial Optimization Problem

Mikel Malagón; Josu Ceberio; Jon Vadillo; Jose A. Lozano

The Optimal Explorer Hypothesis and Its Formulation as a Combinatorial Optimization Problem

Mikel Malagón, Josu Ceberio, Jon Vadillo, Jose A. Lozano

Published: 20 Mar 2025, Last Modified: 27 Mar 2025MAEB 2025 ProyectosEveryoneRevisionsBibTeXCC BY 4.0

Supplementary Material: zip

Keywords: agent, open-endedness, reinforcement learning, continual learning, lifelong learning

Abstract: This research project explores the hypothesis that, given a bounded number of steps in an environment, agents that most efficiently optimize their model of the environment are more likely to induce emergent intelligent behavior in a reward-free scenario. We refer to this as the optimal explorer hypothesis. The project aims to formalize and analyze this hypothesis, investigating its theoretical implications and connections to related areas such as open-ended learning and active inference. Building on this foundation, we will develop a practical implementation of an approximate "optimal explorer" agent by formulating it as a combinatorial optimization problem and leveraging established methods from the field. Finally, we will conduct extensive experiments to evaluate whether the proposed agent induces emergent behaviors in diverse and challenging environments.

Submission Number: 8

Loading