Sentiment-weighted advantage updates for portfolio optimization with reinforcement learning

18 Sept 2025 (modified: 20 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, sentiment, large language models
Abstract: Conventional reinforcement learning (RL) methods for portfolio optimization, such as proximal policy optimization (PPO), rely mainly on historical price data and overlook unstructured market signals like investor sentiment. This paper introduces Sentiment-Augmented PPO (SAPPO), a reinforcement learning framework that integrates daily asset-level sentiment into both the state representation and the policy update. The core innovation is a sentiment-weighted advantage function, where sentiment scores act as dynamic multipliers on advantage estimates, thereby shaping policy gradients in a behaviorally informed manner. This design differs from prior sentiment-aware approaches that inject sentiment only into state vectors or reward shaping, enabling more stable and context-sensitive learning under market nonstationarity. Empirical evaluation on Refinitiv news and NASDAQ-100 stocks shows that SAPPO outperforms vanilla PPO and sentiment-in-state/reward baselines, raising Sharpe ratio from 1.67 to 2.07 and annualized returns from 57\% to 83\% with only modest drawdown increase. Extensive ablations confirm that the gains stem from the sentiment-weighted update mechanism rather than from any specific sentiment model. These results highlight the potential of integrating behavioral signals into reinforcement learning for financial decision-making.
Primary Area: reinforcement learning
Submission Number: 14185
Loading