Abstract: Highlights•A more accurate state-action value estimation method is proposed.•Two algorithms are developed: Reinforced Q-learning and Reinforced Delayed Deep Deterministic policy gradient.•Innovative use of Mean Square Error to analyze bias and variance in different methods.•Theoretically analyze the effectiveness of the proposed method and explore the upper bound.•Experimental results on multiple benchmarks indicate that our methods can achieve state-of-the-art performance.
Loading