Unrolled Policy Iteration Via Graph Filters

Published: 23 Sept 2025, Last Modified: 21 Oct 2025NPGML PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Dynamic Programming, Graph Signal Processing, Algorithm Unrolling, Graph Filter
TL;DR: This work unrolls policy iteration from a graph signal processing perspective
Abstract: Dynamic programming (DP) is a cornerstone for solving Markov decision processes (MDPs) through Bellman’s optimality equations. Classical algorithms such as policy iteration exploit their fixed-point structure but become costly in large state–action spaces or with long-term dependencies. We propose BellNet, a parametric model that unrolls and truncates policy iterations, trained to minimize the Bellman error from random value function initializations. By interpreting the MDP transition matrix as the adjacency of a weighted directed graph, we leverage graph signal processing to re-parameterize BellNet as a cascade of nonlinear graph filters, offering a concise and transferable representation of policy and value iteration with explicit control of inference complexity. Experiments in grid environments show that BellNet approximates optimal policies in far fewer iterations than classical methods and generalizes, without retraining, to related unseen tasks.
Submission Number: 91
Loading