Epistemic Exploration for Generalizable Planning and Learning in Non-Stationary Stochastic Settings

Published: 28 Oct 2023, Last Modified: 04 Jan 2024GenPlan'23EveryoneRevisionsBibTeX
Abstract: Reinforcement Learning (RL) provides a convenient framework for sequential decision making when closed-form transition dynamics are unavailable and can frequently change. However, the high sample complexity of RL approaches limits their utility in the real-world. This paper presents an approach that performs meta-level exploration in the space of models and uses the learned models to compute policies. Our approach interleaves learning and planning allowing data-efficient, task-focused sample collection in the presence of non-stationarity. We conduct an empirical evaluation on benchmark domains and show that our approach significantly outperforms baselines in sample complexity and easily adapts to changing transition systems across tasks.
Submission Number: 77