Directly Forecasting Belief for Reinforcement Learning with Delays

Qingyuan Wu; Yuhui Wang; Simon Sinong Zhan; Yixuan Wang; Chung-Wei Lin; Chen Lv; Qi Zhu; Jürgen Schmidhuber; Chao Huang

Directly Forecasting Belief for Reinforcement Learning with Delays

Qingyuan Wu, Yuhui Wang, Simon Sinong Zhan, Yixuan Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0

TL;DR: We present the directly forecasting belief method which can effectively reduce the compounding errors and improve performance.

Abstract: Reinforcement learning (RL) with delays is challenging as sensory perceptions lag behind the actual events: the RL agent needs to estimate the real state of its environment based on past observations. State-of-the-art (SOTA) methods typically employ recursive, step-by-step forecasting of states. This can cause the accumulation of compounding errors. To tackle this problem, our novel belief estimation method, named Directly Forecasting Belief Transformer (DFBT), directly forecasts states from observations without incrementally estimating intermediate states step-by-step. We theoretically demonstrate that DFBT greatly reduces compounding errors of existing recursively forecasting methods, yielding stronger performance guarantees. In experiments with D4RL offline datasets, DFBT reduces compounding errors with remarkable prediction accuracy. DFBT's capability to forecast state sequences also facilitates multi-step bootstrapping, thus greatly improving learning efficiency. On the MuJoCo benchmark, our DFBT-based method substantially outperforms SOTA baselines. Code is available at \href{https://github.com/QingyuanWuNothing/DFBT}{https://github.com/QingyuanWuNothing/DFBT}.

Lay Summary: Robots and AI systems often make decisions based on delayed information, which can cause mistakes to build up over time. Most current techniques try to guess the missing information step-by-step, but small errors add up quickly. Our research introduces a new method called the Directly Forecasting Belief Transformer (DFBT). Unlike existing approaches, DFBT makes direct predictions all at once instead of in a chain. This helps it stay accurate and learn faster. We tested it on popular benchmarks and found it remarkably outperforms existing methods.

Link To Code: https://github.com/QingyuanWuNothing/DFBT

Primary Area: Reinforcement Learning

Keywords: reinforcement learning, reinforcement learning with delays, belief representation

Submission Number: 1344

Loading