Transfer Learning, Reinforcement Learning for Adaptive Control Optimization under Distribution Shift

Published: 28 Oct 2023, Last Modified: 02 Apr 2024DistShift 2023 PosterEveryoneRevisionsBibTeX
Keywords: Transfer Learning, Reinforcement Learning, Optimal Control
TL;DR: TL in RL for Fraud Risk Optimization under Distribution Shift
Abstract: Many control systems rely on a pipeline of machine learning models and hand-coded rules to make decisions. However, due to changes in the operating environment, these rules require constant tuning to maintain optimal system performance. Reinforcement learning (RL) can automate the online optimization of rules based on incoming data. However, RL requires extensive training data and exploration, which limits its application to new rules or those with sparse data. Here, we propose a transfer learning approach called Learning from Behavior Prior (LBP) to enable fast, sample-efficient RL optimization by transferring knowledge from an expert controller. We demonstrate this approach by optimizing the rule thresholds in a simulated control pipeline across differing operating conditions. Our method converges 5x faster than vanilla RL, with greater robustness to distribution shift between the expert and target environments. LBP reduces negative impacts during live training, enabling automated optimization even for new controllers.
Submission Number: 83