Leveraging LLM-based sentiment analysis for portfolio optimization with proximal policy optimization

Published: 07 Jun 2025, Last Modified: 05 Aug 2025Practical-DL 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, proximal policy optimization, stock portfolio optimization, sentiment analysis
Abstract: Reinforcement learning (RL) offers adaptive solutions for portfolio optimization, but standard methods like proximal policy optimization (PPO) rely solely on historical price data and ignore investor sentiment. We introduce sentiment-augmented PPO (SAPPO), a reinforcement learning framework that incorporates real-time financial news sentiment. LLaMA 3.3, a large language model fine-tuned for financial text, extracts daily sentiment scores from Refinitiv news, which are integrated into the agent’s decision-making. SAPPO modifies the PPO advantage function with sentiment input and learns allocation strategies that adapt to both price trends and market sentiment. Experiments on a three-stock portfolio—Google, Microsoft, and Meta—show that SAPPO improves the Sharpe ratio from 1.55 to 1.90 and reduces drawdowns compared to PPO. The best performance occurs at $\lambda = 0.1$, supported by ablation studies and statistically significant $t$-tests ($p < 0.001$). Results demonstrate that sentiment-aware reinforcement learning enhances trading performance and provides a robust alternative to purely price-driven strategies.
Submission Number: 8
Loading