Abstract: This work presents a novel architecture for an open-ended learning system that integrates intrinsic motivation (IM) and classical planning to enable agents continuously learn and improve their knowledge over time in an unsupervised fashion. The main goal is to allow the agent to autonomously distill its experience into Probabilistic Planning Domain Definition Language (PPDDL) terms, thereby making causal relationships explicit and supporting automated planning. Starting with a virtually empty set of predefined tasks or goals, the agent harnesses intrinsic motivation to explore the environment autonomously, continuously using and enriching the high-level knowledge acquired through its experience in a virtuous cycle. Experimental evaluation in the Treasure Game domain demonstrates the effectiveness of the proposed approach: starting with only a small set of primitive actions, we show how an agent can autonomously build and refine a high-level representation of the environment. Planning-based strategies grounded in this representation significantly outperform uninformed exploration by reaching intermediate sub-goals more efficiently and substantially reducing the time required to achieve the final objective.
Loading