Neural Policy Ensembles are Sub-Optimal

Gregory Provan

Neural Policy Ensembles are Sub-Optimal

Gregory Provan

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: policy learning, ensemble policy learning

TL;DR: We formally prove that (non-linear) neural policy ensembles are sub-optimal with respect to linear policy ensembles, and empirically validate our theory.

Abstract: We develop a theoretical framework to formally prove that (non-linear) neural policy ensembles are sub-optimal with respect to linear policy ensembles. We empirically validate our theoretical claims through a variety of comparisons between policy ensembles composed of linear and of (non-linear) neural policies. We empirically show that well-tuned neural policy ensembles $\Pi^{N}$ under-perform equivalent linear ensembles, often by 2 orders of magnitude. We further show that, under identical operating conditions for ensembles of policies (each of which is stable), $\Pi^{N}$ can show significant instability while linear policy ensembles are stable. This sub-optimality has significant implications for all neural policy ensemble research, from those based on Reinforcement Learning to Mixture-of-Expert agentic-AI policies.

Supplementary Material: zip

Primary Area: learning on time series and dynamical systems

Submission Number: 9222

Loading