Learning to Control on the Fly

Zhanzhan Zhao

Learning to Control on the Fly

Zhanzhan Zhao

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: online decision making, convergence, regret bound, bounded random noise

Abstract: This paper proposes an algorithm which learns to control on the fly. The proposed algorithm has no access to the transition law of the environment, which is actually linear with bounded random noise, and learns to make decisions directly online without training phases or sub-optimal policies as the initial input. Neither estimating the system parameters nor the value functions online, the proposed algorithm adapts the ellipsoid method into the online decision making setting. By adding linear constraints when the feasibility of the decision variable is violated, the volume of the decision variable domain can be collapsed and we upper bound the number of online linear constraints needed for the convergence of the state to be around the desired state under the bounded random state noise. The algorithm is also proved to be of constant bounded online regret given certain range of the bound of the random noise.

One-sentence Summary: We propose a learning to control on the fly algorithm without the transition law, training phases, or sub-optimal policies as input. The convergence and regret are proved when the ground truth environment is linear with bounded random noise.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=RftyeivxFq

5 Replies

Loading