Safe Offline Reinforcement Learning using Trajectory-Level Diffusion Models

Ralf Römer; Lukas Brunke; Martin Schuck; Angela P. Schoellig

Safe Offline Reinforcement Learning using Trajectory-Level Diffusion Models

Ralf Römer, Lukas Brunke, Martin Schuck, Angela P. Schoellig

Published: 09 Apr 2024, Last Modified: 10 Apr 2024ICRA 2024: Back to the FutureEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Offline RL; Diffusion Models

TL;DR: We address offline RL by learning trajectory distributions using diffusion models and incorporate a projection method into the backward diffusion process to guarantee safety while ensuring dynamic feasibility.

Abstract: Despite its success in controlling robotic systems, reinforcement learning (RL) suffers from several issues that hinder its widespread adoption in real-world scenarios. Recently, diffusion models have emerged as a powerful tool to address some of the longstanding challenges in offline and model-based RL, improving long-horizon planning and facilitating multitask generalization. However, these algorithms are unsuitable for operating in unseen and dynamic environments where novel and time-varying constraints not represented in the training data may arise. To address this issue, we propose incorporating a projection scheme into diffusion-based trajectory generation.Our approach uses the iterative nature of diffusion models and alternates the conditional backward diffusion process with a projection of the noisy trajectory onto the constraint set. As a result, we can generate trajectories that are both safe and dynamically feasible while still achieving high reward. We evaluate our approach for goal-conditioned offline RL for two simulated robotic systems navigating in environments with static and dynamic obstacles, representing novel test-time constraints. We show that our method can satisfy these constraints in closed loop, greatly increasing the success rate of reaching the goal.

Submission Number: 5

Loading