Rule-Based Grid World Exploration under Uncertainty

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: intrinsic rewards, inductive biases, planning, uncertainty, deep reinforcement learning, reinforcement learning
TL;DR: A data-efficient rule-based learning agent for grid world environments
Abstract: Grid world environments expose core challenges in sequential decision-making, including planning under partial observability and achieving sample-efficient generalization. Current Deep Reinforcement Learning methods often require millions of interactions in these structured domains, struggling to capture causal dependencies critical for efficient adaptation. We present a novel experiential learning agent with causally-informed intrinsic reward that is capable of learning sequential and causal dependencies in a robust and data-efficient way within grid world environments. After reflecting on state-of-the-art Deep Reinforcement Learning algorithms, we provide a relevant discussion of common techniques as well as our own systematic comparison within multiple grid world environments. We also investigate the conditions and mechanisms leading to data-efficient learning and analyze relevant inductive biases that our agent utilizes to effectively learn causal knowledge and to plan for rewarding future states of greatest expected return.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 20001
Loading