A New, Physics-Based Continuous-Time Reinforcement Learning Algorithm with Performance Guarantees

Brent A. Wallace; Jennie Si

A New, Physics-Based Continuous-Time Reinforcement Learning Algorithm with Performance Guarantees

Brent A. Wallace, Jennie Si

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Reinforcement Learning (RL), Continuous Time (CT), Optimal Control, Physics-Based

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: We introduce a new, physics-based continuous-time reinforcement learning (CT-RL) algorithm for control of affine nonlinear systems, an area that enables a plethora of well-motivated applications. Based on fundamental input/output control mechanisms, our approach uses reference command input (RCI) as probing noise in learning. With known physical dynamics of the environment, and by leveraging on the Kleinman algorithm structure, our RCI-based CT-RL algorithm not only provides theoretical guarantees such as learning convergence, solution optimality, and closed-loop stability, but also well-behaved dynamic system responses with data efficiency during learning. Our results are therefore an advance from the two currently available classes of approaches to CT-RL. The first school of adaptive dynamic programming (ADP) methods features elegant theoretical results stemming from adaptive and optimal control. Yet, they have not been shown effectively synthesizing meaningful controllers. The second school of fitted value iteration (FVI) methods, also the state-of-the-art (SOTA) deep RL (DRL) design, has shown impressive learning solutions, yet theoretical guarantees are still to be developed. We provide several evaluations to demonstrate that our RCI-based design leads to new, SOTA CT-RL results.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8465

Loading