Abstract: A fundamental challenge in continuous-time optimal control (OC) is the efficient computation of adaptive policies when agents act in unknown, uncertain environments. Traditional OC methods, such as dynamic programming, face challenges in scalability and adaptability due to the curse-of-dimensionality and the reliance on fixed models of the environment. One approach to address these issues is Model Predictive Control (MPC), which iteratively computes open-loop controls over a receding horizon. However, classical MPC algorithms typically also assume a fixed environment. Another approach is Reinforcement Learning (RL) which scales well to high-dimensional setups but is often sample inefficent. Certain RL methods can also be unreliable in highly stochastic continuous-time setups and may be unable to generalize to unseen environments. This paper presents the Deep Adaptive Regulator (DARE) which uses physics-informed neural network based approximations to the agent's value function and policy which are trained online to adapt to unknown environments. To manage uncertainty of the environment, DARE optimizes an augmented reward objective which dynamically trades off exploration with exploitation. We show that our method effectively adapts to unseen environments in settings where ``classical'' RL fails and is suited for online adaptive decision-making in environments that change in real time.
Format: Long format (up to 8 pages + refs, appendix)
Publication Status: No
Submission Number: 74
Loading