A model-free robust policy iteration algorithm for optimal control of nonlinear systems

Shubhendu Bhasin; Marcus Johnson; Warren E. Dixon

A model-free robust policy iteration algorithm for optimal control of nonlinear systems

Shubhendu Bhasin, Marcus Johnson, Warren E. Dixon

Published: 01 Jan 2010, Last Modified: 14 May 2024CDC 2010EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: An online model-free solution is developed for the infinite-horizon optimal control problem for continuous-time nonlinear systems. A novel actor-critic-identifier (ACI) structure is used to implement the Policy Iteration algorithm, wherein two neural network structures are used - a robust dynamic neural network (DNN) to asymptotically identify the uncertain system with additive disturbances, and a critic NN to approximate the value function. The weight update laws for the critic NN are generated using a gradient-descent method based on a modified temporal difference error, which is independent of the system dynamics. The optimal control law (or the actor) is computed using the critic NN and the identifier DNN. Uniformly ultimately bounded (UUB) stability of the closed-loop system is guaranteed. The actor, critic and identifier structures are implemented in real-time, continuously and simultaneously.

Loading