Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes

Min Zhang; Hongyao Tang; Jianye HAO; YAN ZHENG

Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes

Min Zhang, Hongyao Tang, Jianye HAO, YAN ZHENG

Published: 19 Dec 2025, Last Modified: 05 Jan 2026AAMAS 2026 FullEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Policy Abstraction, Value Generalization, Policy Space Compression

Abstract: In intelligent decision-making systems, how policy is represented and optimized is a fundamental problem. The root challenge stems from the large scale and the high complexity of policy space. Towards a desirable surrogate policy space, recent policy representations in a low-dimensional latent space have revealed their potential in improving both evaluation and optimization of policy. The key question to answer in this line of research is by what criterion the policy space should be abstracted for favorable compression and generalization. However, both the theory of policy abstraction and the method of policy representation learning are under-studied. In this work, we first make efforts to fill the vacancy. First, we propose a unified policy abstraction theory, containing three types of policy abstraction and explaining their partial ordering relationship. Then, we generalize policy abstractions to three policy metrics that quantify the distance between policies. Further, we propose a policy representation learning approach and policy optimization algorithm based on deep metric learning. Our study highlights the importance of policy abstraction theory and representation method, demonstrating their effectiveness in compressing policy space, characterizing policy differences, and conveying policy generalization.

Area: Representation and Reasoning (RR)

Generative A I: I acknowledge that I have read and will follow this policy.

Submission Number: 894

Loading