Koopman Spectrum Nonlinear Regulators and Efficient Online Learning

Published: 23 Jun 2024, Last Modified: 23 Jun 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Most modern reinforcement learning algorithms optimize a cumulative single-step cost along a trajectory. The optimized motions are often ‘unnatural’, representing, for example, behaviors with sudden accelerations that waste energy and lack predictability. In this work, we present a novel paradigm of controlling nonlinear systems via the minimization of the Koopman spectrum cost: a cost over the Koopman operator of the controlled dynamics. This induces a broader class of dynamical behaviors that evolve over stable manifolds such as nonlinear oscillators, closed loops, and smooth movements. We demonstrate that some dynamics characterizations that are not possible with a cumulative cost are feasible in this paradigm, which generalizes the classical eigenstructure and pole assignments to nonlinear decision making. Moreover, we present a sample efficient online learning algorithm for our problem that enjoys a sub-linear regret bound under some structural assumptions.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Following the reviewers' comments, we have made several changes and additions as follows: 1. We added a description with an illustrative picture showing an analogy to Fourier analysis; where we describe the relations between the cumulative cost and the Koopman spectrum cost from a different angle. 2. We added several clarifying texts regarding (1) design of the KSNR optimization objective, (2) how the objective relates to the controller design in control problems, (3) how the CEM search will heuristically solve this objective, (4) descriptions about population based policy search, (5) the term ''regulator'', (6) the term ''model-based'' in our context, (7) what ``unpredictability'' means in introduction etc. and (8) how the matrix realization $\mathscr{K}$ is obtained through CEM search. 3. We added remarks for each assumption in the learning algorithm section with more clarification on the necessity of exploration of Assumption 5. 4. We added clarification on how the adversary's choice affects regret. 5. We added a bit more clarifications on the relation to KNR. 6. We added relations to some skill learning literature including motor primitives. 7. We added more numerical experiments, showing that the cartpole stabilization is not intended to show oscillation but to show systematic stabilization (as also explained with an analogy to Fourier analysis), qualitative difference of smoothness enforcement by the Koopman spectrum cost and the action cost, and that how the choice of $\mathcal{H}_0$ affects the extracted motions. 8. We added broader impact section and more words in the limitation section. 9. Also we added additional simple linear system experiment for the leaning algorithm. 10. Other minor typos etc. 11. Clarify some parts
Video: https://sites.google.com/view/ksnr-dynamics/
Code: https://sites.google.com/view/ksnr-dynamics/
Supplementary Material: zip
Assigned Action Editor: ~Amir-massoud_Farahmand1
Submission Number: 2029