Abstract: Model-based reinforcement learning is an effective
approach for controlling an unknown system. It is based on
a longstanding pipeline familiar to the control community
in which one performs experiments on the environment to
collect a dataset, uses the resulting dataset to identify a model
of the system, and finally performs control synthesis using
the identified model. As interacting with the system may be
costly and time consuming, targeted exploration is crucial for
developing an effective control-oriented model with minimal
experimentation. Motivated by this challenge, recent work has
begun to study finite sample data requirements and sample
efficient algorithms for the problem of optimal exploration in
model-based reinforcement learning. However, existing theory
and algorithms are limited to model classes which are linear
in the parameters. Our work instead focuses on models with
nonlinear parameter dependencies, and presents the first finite
sample analysis of an active learning algorithm suitable for
a general class of nonlinear dynamics. In certain settings,
the excess control cost of our algorithm achieves the optimal
rate, up to logarithmic factors. We validate our approach in
simulation, showcasing the advantage of active, control-oriented
exploration for controlling nonlinear systems.
Loading