DRL-STAF: A Deep Reinforcement Learning Framework for State-aware Forecasting of Complex Multivariate Hidden Markov Process

Manrui Jiang; Jingru Huang; Yong Chen; Chen Zhang

DRL-STAF: A Deep Reinforcement Learning Framework for State-aware Forecasting of Complex Multivariate Hidden Markov Process

Manrui Jiang, Jingru Huang, Yong Chen, Chen Zhang

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multivariate Hidden Markov Process, State Estimation, Hidden Markov Model, Deep Reinforcement Learning, Deep learning

TL;DR: We propose a new state-aware forecasting framework for multivariate hidden Markov processes, which integrates deep learning for flexible emission modeling with deep reinforcement learning to learn complex time-varying latent state transitions.

Abstract: Multivariate hidden Markov process forecasting remains challenging due to nonlinearity, nonstationarity, hidden state transitions, and cross-sequence dependencies. Deep learning (DL) methods have shown strong predictive performance in time series forecasting but generally lack explicit state modeling and interpretable state estimation, while Hidden Markov Model (HMM) and its variants can provide explicit state representations but are limited in capturing complex nonlinear observation patterns and suffer from scalability issues. To address these limitations, we propose a Deep Reinforcement Learning based framework for STate-Aware Forecasting of complex multivariate hidden Markov process (DRL-STAF), which simultaneously predicts the next-step observation and estimates the corresponding hidden state. In the proposed framework, deep learning is used as the emission function to capture complex nonlinear observation patterns, while deep reinforcement learning models state transitions, supporting flexible adaptation to diverse transition patterns without predefined structural assumptions. In particular, DRL-STAF remains effective when dealing with complex multivariate hidden Markov processes, such as coupled higher-order semi-Markov dynamics, that typically suffer from state-space explosion. Comprehensive experiments demonstrate superior predictive performance and accurate state estimation compared with HMM and its variants, standalone deep learning methods, and existing DL-HMM hybrid methods.

Primary Area: learning on time series and dynamical systems

Submission Number: 16143

Loading