Consistency Trajectory Planning: High-Quality and Efficient Trajectory Optimization for Offline Model-Based Reinforcement Learning

TMLR Paper5368 Authors

13 Jul 2025 (modified: 19 Dec 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper introduces Consistency Trajectory Planning (CTP), a novel offline model-based reinforcement learning method that leverages the recently proposed Consistency Trajectory Model (CTM) for efficient trajectory optimization. While prior work applying diffusion models to planning has demonstrated strong performance, it often suffers from high computational costs due to iterative sampling procedures. CTP supports few-step trajectory generation without significant degradation in policy quality. We evaluate CTP on the D4RL benchmark and show that it consistently outperforms existing diffusion-based planning methods in long-horizon, goal-conditioned tasks. Notably, CTP achieves higher normalized returns while using fewer denoising steps. In particular, CTP attains comparable—or even superior—performance with reduced inference cost, highlighting its practicality and effectiveness for high-performance, low-latency offline planning.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. Added new ablation studies in Section 5.4 (Table 6) and Appendix B (Table 8), including controlled comparisons with DD-improved and CP-improved variants to address the reviewer’s concern regarding inference efficiency. 2. Included D-QL results in all comparative tables (Tables 2, 4, and 5) for completeness and fairness in baseline comparisons. 3. Revised main claims in the Abstract and Section 5.3 to more accurately reflect the efficiency comparison and avoid overstatement. 4. Clarified implementation details of DD-improved and CP-improved to highlight that they share the same architecture, horizon, stride, and value-filtering configuration as CTP. 5. Added inference-time statistics to key tables to provide a clearer picture of computational efficiency. 6. Refined writing and formatting throughout Sections 5.3 and 5.4 for clarity and consistency with the new results.
Assigned Action Editor: ~Matteo_Papini1
Submission Number: 5368
Loading