Sentiment-augmented reinforcement learning for portfolio optimization with large language models

Kemal Kirtac

Sentiment-augmented reinforcement learning for portfolio optimization with large language models

Kemal Kirtac

10 Aug 2025 (modified: 27 Oct 2025)Submitted to NeurIPS Lock-LLM Workshop 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, NLP, stock portfolio optimization, sentiment analysis

Abstract: Conventional reinforcement learning (RL) methods for portfolio optimization, such as proximal policy optimization (PPO), rely solely on historical price data and overlook unstructured market signals like investor sentiment. This paper introduces sentiment-augmented PPO (SAPPO), a novel RL framework that incorporates daily asset-level sentiment—extracted from Refinitiv financial news using large transformer-based language models—into both the state representation and the policy gradient. Specifically, SAPPO modifies the advantage function with a sentiment-weighted term, enabling context-aware policy updates aligned with dynamic investor beliefs. This design improves adaptability under market nonstationarity and serves as a behaviorally informed extension of PPO. Empirical evaluation shows that SAPPO significantly outperforms vanilla PPO, with Sharpe ratio rising from 1.67 to 2.07 and annualized returns increasing from 57% to 83%, with only modest drawdown increase. Extensive ablation studies confirm that the performance gains arise from sentiment-guided updates. The results demonstrate the effectiveness of multimodal RL strategies that integrate financial text signals to enhance decision-making under uncertainty.

Submission Number: 1

Loading