Enhancing Portfolio Optimization via Heuristic-Guided Inverse Reinforcement Learning with Multi-Objective Reward and Graph-based Policy Learning

Wenyi Zhang, Renjun Jia, Yanhao Wang, Dawei Cheng, Minghao Zhao, Cen Chen

Published: 2025, Last Modified: 21 Jan 2026IJCAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Portfolio optimization encounters persistent challenges in adapting to dynamic markets due to static assumptions and high-dimensional decision spaces. Although reinforcement learning (RL) has emerged as a potential solution, conventional reward engineering often fails to capture complex market dynamics. Recent advances in deep RL and graph neural networks have attempted to enhance market microstructure modeling. However, these methods still struggle with the systematic integration of financial knowledge. To address the above issues, we propose a novel heuristic-guided inverse reinforcement learning framework for portfolio optimization. Specifically, our framework provides an interpretable expert strategy generation mechanism that takes into account sector diversification and correlation constraints. Then, a multi-objective reward optimization method is adopted to adaptively strike a balance between returns and risks. Furthermore, it also utilizes heterogeneous graph policy learning with hierarchical attention mechanisms to explicitly model inter-stock relationships. Finally, we conduct extensive experiments on real-world financial market data to demonstrate that our framework outperforms several state-of-the-art deep learning and RL baselines in terms of risk-adjusted returns. We provide case studies to showcase the ability of our framework to balance return maximization and risk containment. Our code is publicly available at https://github.com/ChloeWenyiZhang/SmartFolio/.

External IDs:dblp:conf/ijcai/ZhangJ0CZ025