Rule-Based Grid World Exploration under Uncertainty

ICLR 2026 Conference Submission20001 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: intrinsic rewards, inductive biases, planning, uncertainty, deep reinforcement learning, reinforcement learning
TL;DR: A data-efficient rule-based learning agent for grid world environments
Abstract: Grid world environments expose core challenges in sequential decision-making, including planning under partial observability and achieving sample-efficient generalization. Current Deep Reinforcement Learning methods often require millions of interactions in these structured domains, struggling to capture causal dependencies critical for efficient adaptation. We present a novel experiential learning agent with causally-informed intrinsic reward that is capable of learning sequential and causal dependencies in a robust and data-efficient way within grid world environments. After reflecting on state-of-the-art Deep Reinforcement Learning algorithms, we provide a relevant discussion of common techniques as well as our own systematic comparison within multiple grid world environments. We also investigate the conditions and mechanisms leading to data-efficient learning and analyze relevant inductive biases that our agent utilizes to effectively learn causal knowledge and to plan for rewarding future states of greatest expected return.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 20001
Loading