Aligning LLMs using Reinforcement Learning from Market Feedback (RLMF) for Regime Adaptation

Raeid Saqur

Aligning LLMs using Reinforcement Learning from Market Feedback (RLMF) for Regime Adaptation

Raeid Saqur

Published: 10 Oct 2024, Last Modified: 15 Nov 2024Pluralistic-Alignment 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLMs, RLMF, RLHF, Alignment

TL;DR: We introduce a novel approach for regime adaptation in the financial market using RL-based dynamic alignment of LLMs from natural, automatic market rewards from daily events.

Abstract: We propose a regime adaptive execution methodology in the financial market domain to tackle the regime switching problem. Dynamic regime switching, or underlying correlation and covariance shifts in true (hidden) market variables, diminishes the robustness of expert/specialist models on downstream tasks like forecasting or market movement prediction from unseen, online data. Our method uses natural, intrinsic market rewards for adaptive RL alignment (RLMF) of expert LLMs; and a teacher-student, repeating dual-phase (train, execute) pipeline that consistently outperforms SOTA trillion parameter models like GPT-4o. Our approach does not rely on the strength of underlying expert models -- any contemporary off-the-shelf foundational LLM model is compatible with our (plug-and-play) algorithm. We use the Llama-2 7B parameter class of base model to show the efficacy of our method that outperforms both generalist and specialist class of expert models and attain strong empirical results including 15\% increase in predictive accuracy on concurrent stock-movement prediction benchmarks

Submission Number: 6

Loading