Abstract: Highlights•A novel multi-step Q-learning method is proposed to improve data efficiency for DRL.•The proposed multi-step Q-learning method is derived by adopting a new return function.•The new return function alters the discount of future rewards and loosens the impact of the immediate reward.•Experimental-results shows the proposed methods can improve the data efficiency of DRL agents.
Loading