Controlling estimation error in reinforcement learning via Reinforced Operation

Published: 01 Jan 2024, Last Modified: 30 Sept 2024Inf. Sci. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A more accurate state-action value estimation method is proposed.•Two algorithms are developed: Reinforced Q-learning and Reinforced Delayed Deep Deterministic policy gradient.•Innovative use of Mean Square Error to analyze bias and variance in different methods.•Theoretically analyze the effectiveness of the proposed method and explore the upper bound.•Experimental results on multiple benchmarks indicate that our methods can achieve state-of-the-art performance.
Loading