CFQI: Fitted Q-Iteration with Complex Returns

Robert William Wright, Xingye Qiao, Lei Yu, Steven Loscalzo

2015 (modified: 06 Nov 2022)AAMAS 2015Readers: Everyone

Abstract: Fitted Q-Iteration (FQI) is a popular approximate value iteration (AVI) approach that makes effective use of off-policy data. FQI uses a 1-step return value update which does not exploit the sequential nature of trajectory data. Complex returns (weighted averages of the n-step returns) use trajectory data more effectively, but have not been used in an AVI context because of off-policy bias. In this paper we propose a new generalization of FQI called Complex Fitted Q-Iteration (CFQI) which allows for complex returns. Theoretical properties are proved that show CFQI does not break existing convergence properties. Two methods for integrating complex returns are presented. The first method uses a simple truncating procedure for reducing off-policy bias. Our second method applies a novel bounding operation that utilizes the off-policy bias. We provide an empirical evaluation of the proposed methods on several reinforcement learning benchmarks. The results demonstrate that our methods significantly improve over FQI in terms of value estimation accuracy, policy performance, and convergence speed.

0 Replies