A Dirichlet Policy Reuse Approach for Financial Markets

Published: 01 Jan 2021, Last Modified: 15 May 2025ICTAI 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Reinforcement learning (RL) is the training of smart agents to take action in an interactive environment in order to maximize the cumulative reward. It has achieved great success in many applications. However, in some domains, e.g., in financial markets, two issues hampered the wide usage of RL. First, it is expensive to interact with the environment as global computation is incurred when receiving new signals. Second, an agent itself may find it difficult to sense changes in the environment. In this paper, we introduce a general framework named Dirichlet Policy Reuse (DPR) to tackle these issues in a financial investment setting. On one hand, DPR will reuse the historical experience as much as possible to control trial and error within a lower level. On the other hand, DPR is ought to be aggressive enough to make exploration when the agent encounters disagreement from the model consensus in our policy library. We validated our framework on several tasks in the financial investment context, including single asset trading and portfolio investment. We show in experiments that DPR can reuse historical knowledge, detect the potential changes in the environment and make appropriate adaptation timely. This paper is a preliminary study on the dynamic policy self-adaptation strategy between knowledge reuse and environment exploration. It lays a good foundation for future research on RL problems under non-stationary environments and especially for investment scenarios.
Loading