everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Reinforcement learning (RL) has shown significant promise in stock trading. A typical solution involves optimizing cumulative returns using historical offline data. However, it may produce less generalizable policies that merely "memorize" optimal buying and selling actions from the offline data while neglecting the non-stationary nature of the financial market. We frame stock trading as a specific type of offline RL problem. Our method, MetaTrader, presents two key contributions. First, it introduces a novel bilevel actor-critic method that spans both the original stock data and its transformations. The fundamental idea is that an effective policy should be generalizable across out-of-distribution data. Second, we propose a novel variant of conservative TD learning, utilizing an ensemble-based TD target to mitigate value overestimation, particularly in scenarios with limited offline data. Our empirical findings across two publicly available datasets demonstrate the superior performance of MetaTrader over existing methods, including both RL-based approaches and stock prediction models.