Safe Offline Reinforcement Learning using Trajectory-Level Diffusion Models

Published: 09 Apr 2024, Last Modified: 10 Apr 2024ICRA 2024: Back to the FutureEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Offline RL; Diffusion Models
TL;DR: We address offline RL by learning trajectory distributions using diffusion models and incorporate a projection method into the backward diffusion process to guarantee safety while ensuring dynamic feasibility.
Abstract: Despite its success in controlling robotic systems, reinforcement learning (RL) suffers from several issues that hinder its widespread adoption in real-world scenarios. Recently, diffusion models have emerged as a powerful tool to address some of the longstanding challenges in offline and model-based RL, improving long-horizon planning and facilitating multitask generalization. However, these algorithms are unsuitable for operating in unseen and dynamic environments where novel and time-varying constraints not represented in the training data may arise. To address this issue, we propose incorporating a projection scheme into diffusion-based trajectory generation.Our approach uses the iterative nature of diffusion models and alternates the conditional backward diffusion process with a projection of the noisy trajectory onto the constraint set. As a result, we can generate trajectories that are both safe and dynamically feasible while still achieving high reward. We evaluate our approach for goal-conditioned offline RL for two simulated robotic systems navigating in environments with static and dynamic obstacles, representing novel test-time constraints. We show that our method can satisfy these constraints in closed loop, greatly increasing the success rate of reaching the goal.
Submission Number: 5