Trust Region Policy Optimization for Functional Linear Policies

Trust Region Policy Optimization for Functional Linear Policies

03 Apr 2026 (modified: 21 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Reinforcement Learning (RL) tasks where the states are given by spatial or temporal measurements often lead to high-dimensional state spaces, making function approximation difficult and unstable. We adapt the classic RL framework to allow the direct use of the inherent functional state, which can be estimated from the discrete measurements. We propose a suitable family of policies based on functional linear models, allowing us to take actions conditionally on functional states. Moreover, we extend Trust Region Policy Optimization (TRPO) to improve such policies and address the challenge of operator inversion in infinite-dimensional spaces using techniques from Functional Data Analysis (FDA). Furthermore, we implement Proximal Policy Optimization (PPO) for these policies. In experiments on three PDE control tasks, functional policies yield more stable training and achieve better performance than multilayer perceptron policies, highlighting the benefits of functional representations in RL.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Mirco_Mutti1

Submission Number: 8245

Loading