Adaptable Safe Policy Learning from Multi-task Data with Constraint Prioritized Decision Transformer
Keywords: Reinforcement Learning, Safe Reinforcement Learning, Offline Reinforcement Learning, Multi-task Reinforcement Learning
TL;DR: This paper introduces CoPDT, a method of using one unified and adaptable DT model for multi-task (multi-budget or multi-constraint) offline safe RL.
Abstract: Learning safe reinforcement learning (RL) policies from offline multi-task datasets without direct environmental interaction is crucial for efficient and reliable deployment of RL agents. Benefiting from their scalability and strong in-context learning capabilities, recent approaches attempt to utilize Decision Transformer (DT) architectures for offline safe RL, demonstrating promising adaptability across varying safety budgets.
However, these methods primarily focus on single-constraint scenarios and struggle with diverse constraint configurations across multiple tasks.
Additionally, their reliance on heuristically defined Return-To-Go (RTG) inputs limits flexibility and reduces learning efficiency, particularly in complex multi-task environments. To address these limitations, we propose CoPDT, a novel DT-based framework designed to enhance adaptability to diverse constraints and varying safety budgets. Specifically, CoPDT introduces a constraint prioritized prompt encoder, which leverages sparse binary cost signals to accurately identify constraints, and a constraint prioritized Return-To-Go (CPRTG) token mechanism, which dynamically generates RTGs based on identified constraints and corresponding safety budgets. Extensive experiments on the OSRL benchmark demonstrate that CoPDT achieves superior efficiency and significantly enhanced safety compliance across diverse multi-task scenarios, surpassing state-of-the-art DT-based methods by satisfying safety constraints in more than twice as many tasks.
Supplementary Material: zip
Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)
Submission Number: 16812
Loading