Hi-CoLA: High Confidence Lower Bound Approximation Based Reinforcement Learning for Flex-Route Transit Operation Control
Keywords: Safe Reinforcement Learning, High Confidence RL, Imitation Learning, Intelligent Transportation, Offline Reinforcement Learning
TL;DR: We propose Hi-CoLA, a RL framework for safety- and cost-critical real-world decision-making applications that integrates a user-defined confidence level into the training process to provide statistically guaranteed policy performance improvement.
Abstract: Reinforcement Learning (RL) has demonstrated impressive empirical success, yet its adoption in safety and cost critical domains remains limited by a fundamental gap: trained policies lack statistically certified performance guarantees prior to deployment. Consequently, safety concerns arise when deploying RL in real-world environments, leading to a growing demand for safe RL algorithms. In this work, we propose the High Confidence Lower Bound Approximation (Hi-CoLA) framework, a confidence-integrated learning framework for decision-making in safety and cost critical environments. Specifically, we leverage behavioral cloning to transform rule-based decision-making processes into a parameterized policy network, and further employ Hi-CoLA to robustly improve the confidence lower bound and the overall performance of the baseline policy toward optimality with real-world deployment performance guarantee. As a result of this robust training process, the framework is well-suited for real-world deployment. We compare the performance of Hi-CoLA with state-of-the-art safe RL and offline RL approaches in the context of Flex-Route Transit (FRT), an intelligent demand-responsive transit system who requires real-time dynamic routing. Our approach enhances real-time control of FRT with guaranteed performance and is broadly applicable to other safety critical decision-making scenarios.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 27
Loading