Semi-Offline Reinforcement Learning for Portfolio OptimizationDownload PDF

22 Sept 2022 (modified: 08 Jun 2025)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Abstract: We introduce semi-offline reinforcement learning (RL), a new formalization of the sequential decision-making problem for portfolio optimization. Unlike the standard and the fully-offline RL settings, the unique challenge of semi-offline RL is the limited access to an actively evolving environment. Therefore, existing online/offline RL approaches are incapable of handling the distributional shift between the fixed observations in the training set and those in an out-of-distribution test domain. In this paper, we propose a novel off-policy RL algorithm named \textit{stationarity-constrained MDP} (SC-MDP), which decouples the previously-collected training observations into two streams of \textit{stationary} and \textit{non-stationary} latent variables through a probabilistic inference framework. We demonstrate that in this way, the learned policies can be persistently profitable despite rapidly-changing environment dynamics. Our approach remarkably outperforms the existing online RL algorithms, advanced offline RL methods, and state-of-the-art stock prediction models on three real-world financial datasets.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/semi-offline-reinforcement-learning-for/code)
5 Replies

Loading