Jointly Learning Identification and Control for Few-Shot Policy Adaptation

Nina Wiedemann; Antonio Loquercio; Matthias Müller; Rene Ranftl; Davide Scaramuzza

Jointly Learning Identification and Control for Few-Shot Policy Adaptation

Nina Wiedemann, Antonio Loquercio, Matthias Müller, Rene Ranftl, Davide Scaramuzza

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Keywords: policy learning, control, system identification, few-shot domain adaptation

Abstract: Complex dynamical systems are challenging to model and control. Especially when not deployed in controlled conditions, they might be subject to disturbances that cannot be predicted in advance, \emph{e.g.} wind, a payload, or environment-specific forces. Adapting to such disturbances with a limited sample budget is difficult, especially for systems with many degrees of freedom. This paper introduces a theoretical framework to model this problem. We show that the expected error of a sensorimotor controller can be bounded by two components: the optimality of the controller and the domain gap between training and testing due to unmodelled dynamic effects. These components are usually minimized separately; the former with online or offline optimization, the latter with system identification. Motivated by this observation, we propose a differentiable programming approach to \emph{jointly} minimize model and control errors with gradient descent. Similar to model-based methods, our algorithm learns from prior knowledge about the system, but \emph{grounds} the model to account for observed disturbances, thereby favouring sample efficiency. Yet, it maintains the flexibility of model-free methods, which can be applied to generic systems with arbitrary inputs. We evaluate our approach on several complex systems and tasks, and experimentally analyze the advantages over model-free and model-based methods in terms of performance and sample efficiency.

One-sentence Summary: A framework for joint system identification and policy learning for sample efficient domain adaptation of sensorimotor policies.

Supplementary Material: zip

4 Replies

Loading