Scaling Safe Learning-based Control to  Long-Horizon Temporal Tasks

Navid Hashemi; Bardh Hoxha; Danil V. Prokhorov; Georgios Fainekos; Jyotirmoy V. Deshmukh

Scaling Safe Learning-based Control to Long-Horizon Temporal Tasks

Navid Hashemi, Bardh Hoxha, Danil V. Prokhorov, Georgios Fainekos, Jyotirmoy V. Deshmukh

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: neural network, control, signal temporal logics

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: This paper introduces a model-based approach for training parameterized policies for an autonomous agent operating in a highly nonlinear (albeit deterministic) environment. We desire the trained policy to ensure that the agent satisfies specific task objectives and safety constraints, both expressed in Signal Temporal Logic. We show that this learning problem reduces to the problem of training recurrent neural networks (RNNs), where the number of recurrent units is proportional to the temporal horizon of the agent's task objectives. This poses a challenge: RNNs are susceptible to vanishing and exploding gradients, and naive gradient descent-based strategies to solve long-horizon task objectives thus suffer from the same problems. To tackle this challenge, we introduce a novel gradient approximation algorithm based on the idea of gradient sampling, and a smooth computation graph that provides a neurosymblic encoding of STL formulas. We show that these two methods combined improve the quality of the stochastic gradient, enabling scalable backpropagation over long time horizon trajectories. We demonstrate the efficacy of our approach on various motion planning applications requiring complex spatio-temporal and sequential tasks ranging over thousands of time steps.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8232

Loading