Enhancing deep reinforcement learning for stock trading: a reward shaping approach via expert feedback

Arishi Orra, Himanshu Choudhary, Ankit Sharma, Manoj Thakur

Published: 2025, Last Modified: 25 Jan 2026Knowl. Inf. Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep Reinforcement Learning (DRL) techniques have shown significant advancements in automated stock trading. DRL models try to learn the complex dynamics of the market by interacting with it through trial and error. The reward function is the crux of reinforcement learning, guiding the agent to act appropriately to achieve its goal by providing continuous feedback. A common approach in the literature is to use a profit-based reward, as it aligns with the agent’s objective of maximizing returns. However, the profit-loss-based reward function, which is inherently noisy, often induces high variance, leading to unstable learning, particularly in volatile markets. The choice of an appropriate reward function is highly crucial as a poorly designed one may lead to suboptimal behavior of the agent. This paper proposes a reward-shaping scheme that leverages expert feedback to guide the agent in making more informed trading decisions. The expert generates technical indicator-based trading signals and gives the agent dynamic feedback according to market conditions. The efficacy of our proposed approach is validated using stocks from four prominent global stock market indices: Dow, Sensex, TWSE, and FTSE. The empirical results showcased the efficiency of the proposed methodology in generating higher returns across various performance metrics.

External IDs:dblp:journals/kais/OrraCST25