Abstract: Dear Editor, In this letter, the multi-objective optimal control problem of nonlinear discrete-time systems is investigated. A data-driven policy gradient algorithm is proposed in which the action-state value function is used to evaluate the policy. In the policy improvement process, the policy gradient based method is employed, which can improve the performance of the system and finally derive the optimal policy in the Pareto sense. The actor-critic structure is established to implement the algorithm. In order to improve the efficiency of data usage and enhance the learning effect, the experience replay technology is used during the training process, with both offline data and online data. Finally, simulation is given to illustrate the effectiveness of the method.
Loading