Keywords: reinforcement learning, generative models, offline RL, sequential decision making, modularity
TL;DR: Decision Stacks, a modular generative framework for goal-conditioned RL, models observations, rewards, and actions with expressivity, leading to superior performance and flexibility across diverse offline RL tasks in MDPs and POMDPs.
Abstract: Deployment of reinforcement learning algorithms in real-world scenarios often presents numerous challenges such as dealing with complex goals, planning future observations and actions, and critiquing their utilities, demanding a balance between expressivity and flexible modeling for efficient learning and inference.
We present Decision Stacks, a generative framework that decomposes goal-conditioned policy agents into 3 generative modules which simulate the temporal evolution of observations, rewards, and actions.
Our framework guarantees both expressivity and flexibility in designing individual modules to account for key factors such as architectural bias, optimization objective and dynamics, transferrability across domains, and inference speed.
Our empirical results demonstrate the effectiveness of Decision Stacks for offline policy optimization for several MDP and POMDP environments.
Submission Number: 53
Loading