A Policy Optimization Method Towards Optimal-time Stability

Shengjie Wang; Lan Fengb; Xiang Zheng; Yuxue Cao; Oluwatosin OluwaPelumi Oseni; Haotian Xu; Tao Zhang; Yang Gao

A Policy Optimization Method Towards Optimal-time Stability

Shengjie Wang, Lan Fengb, Xiang Zheng, Yuxue Cao, Oluwatosin OluwaPelumi Oseni, Haotian Xu, Tao Zhang, Yang Gao

Published: 30 Aug 2023, Last Modified: 08 Oct 2023CoRL 2023 PosterReaders: Everyone

Keywords: Reinforcement Learning, Robotic Control, Stability

Abstract: In current model-free reinforcement learning (RL) algorithms, stability criteria based on sampling methods are commonly utilized to guide policy optimization. However, these criteria only guarantee the infinite-time convergence of the system's state to an equilibrium point, which leads to sub-optimality of the policy. In this paper, we propose a policy optimization technique incorporating sampling-based Lyapunov stability. Our approach enables the system's state to reach an equilibrium point within an optimal time and maintain stability thereafter, referred to as "\textit{optimal-time stability}". To achieve this, we integrate the optimization method into the Actor-Critic framework, resulting in the development of the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm. Through evaluations conducted on ten robotic tasks, our approach outperforms previous studies significantly, effectively guiding the system to generate stable patterns.

Student First Author: yes

Supplementary Material: zip

Instructions: I have read the instructions for authors (https://corl2023.org/instructions-for-authors/)

Website: https://sites.google.com/view/adaptive-lyapunov-actor-critic

Publication Agreement: pdf

Poster Spotlight Video: mp4

12 Replies

Loading