Abstract: Policy gradient algorithms are useful reinforcement learning methods which optimize a control policy by performing stochastic gradient descent with respect to controller parameters. In this paper, we extend actor-critic algorithms by adding an $\ell_1$ norm regularization on the actor part, which makes our algorithm automatically select and optimize the useful controller basis functions. Our method is closely related to existing approaches to sparse controller design and actuator selection, but in contrast to these, our approach runs online
and does not require a plant model. In order to utilize $\ell_1$ regularization online, the actor updates are extended to include an
iterative soft-thresholding step. Convergence of the algorithm is proved using methods from stochastic approximation. The
effectiveness of our algorithm for control basis and actuator selection is demonstrated on numerical examples.
0 Replies
Loading