Keywords: policy learning, ensemble policy learning
TL;DR: We formally prove that (non-linear) neural policy ensembles are sub-optimal with respect to linear policy ensembles, and empirically validate our theory.
Abstract: We develop a theoretical framework to formally prove that (non-linear) neural policy ensembles are sub-optimal with respect to linear policy ensembles. We empirically validate our theoretical claims through a variety of comparisons between policy ensembles composed of linear and of (non-linear) neural policies. We empirically show that well-tuned neural policy ensembles $\Pi^{N}$ under-perform equivalent linear ensembles, often by 2 orders of magnitude. We further show that, under identical operating conditions for ensembles of policies (each of which is stable), $\Pi^{N}$ can show significant instability while linear policy ensembles are stable. This sub-optimality has significant implications for all neural policy ensemble research, from those based on Reinforcement Learning to Mixture-of-Expert agentic-AI policies.
Supplementary Material: zip
Primary Area: learning on time series and dynamical systems
Submission Number: 9222
Loading