TL;DR: training-free method that corrects off-manifold deviations in diffusion-based planners
Abstract: Recent advances in diffusion-based generative modeling have demonstrated significant promise in tackling long-horizon, sparse-reward tasks by leveraging offline datasets. While these approaches have achieved promising results, their reliability remains inconsistent due to the inherent stochastic risk of producing infeasible trajectories, limiting their applicability in safety-critical applications. We identify that the primary cause of these failures is inaccurate guidance during the sampling procedure, and demonstrate the existence of manifold deviation by deriving a lower bound on the guidance gap. To address this challenge, we propose *Local Manifold Approximation and Projection* (LoMAP), a *training-free* method that projects the guided sample onto a low-rank subspace approximated from offline datasets, preventing infeasible trajectory generation. We validate our approach on standard offline reinforcement learning benchmarks that involve challenging long-horizon planning. Furthermore, we show that, as a standalone module, LoMAP can be incorporated into the hierarchical diffusion planner, providing further performance enhancements.
Lay Summary: Artificial intelligence (AI) is increasingly used to plan tasks that require multiple steps, like navigating robots through complex environments. But a common problem arises: AI planners sometimes suggest unrealistic or impossible paths, which can be dangerous or ineffective.
We propose a simple approach called LoMAP that helps AI planners stay closer to realistic outcomes. Instead of allowing the AI to drift away from feasible solutions, our method regularly checks and gently corrects its plans to ensure they remain practical and safe.
By testing LoMAP on tasks such as robot navigation and movement control, we found it significantly reduces the number of unrealistic solutions generated. This improvement helps make AI-driven decision-making more reliable, especially in safety-critical situations.
Link To Code: https://github.com/leekwoon/lomap
Primary Area: Reinforcement Learning->Planning
Keywords: Offline Reinforcement Learning, Trajectory Optimization, Diffusion Models, Sequential Decision Making
Submission Number: 13073
Loading