Transfer Learning, Reinforcement Learning for Adaptive Control Optimization under Distribution Shift

Pankaj Rajak; Wojciech Kowalinski; Fei Wang

Transfer Learning, Reinforcement Learning for Adaptive Control Optimization under Distribution Shift

Pankaj Rajak, Wojciech Kowalinski, Fei Wang

Published: 28 Oct 2023, Last Modified: 02 Apr 2024DistShift 2023 PosterEveryoneRevisionsBibTeX

Keywords: Transfer Learning, Reinforcement Learning, Optimal Control

TL;DR: TL in RL for Fraud Risk Optimization under Distribution Shift

Abstract:

Many control systems rely on a pipeline of machine learning models and hand-coded rules to make decisions. However, due to changes in the operating environment, these rules require constant tuning to maintain optimal system performance. Reinforcement learning (RL) can automate the online optimization of rules based on incoming data. However, RL requires extensive training data and exploration, which limits its application to new rules or those with sparse data. Here, we propose a transfer learning approach called Learning from Behavior Prior (LBP) to enable fast, sample-efficient RL optimization by transferring knowledge from an expert controller. We demonstrate this approach by optimizing the rule thresholds in a simulated control pipeline across differing operating conditions. Our method converges 5x faster than vanilla RL, with greater robustness to distribution shift between the expert and target environments. LBP reduces negative impacts during live training, enabling automated optimization even for new controllers.

Submission Number: 83

Loading