Language Model-Based Agents to Learn Policy for Text Based Markov Decision Processes

Language Model-Based Agents to Learn Policy for Text Based Markov Decision Processes

ACL ARR 2026 January Submission6953 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Markov Decision Process, LLM Inference

Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in solving various reasoning tasks but typically struggle when directly asked to solve complex Markov Decision Process (MDP) problems due to their sequential and algorithmic nature. In this work, we propose\textbf{ OptiAct}, an algorithmic framework that leverages LLMs by decomposing MDP problems into structured subtasks such as component extraction, mathematical formulation, code generation, and execution to select the \textbf{Opti}mal \textbf{Act}ions. Our approach is a cooperative multi-agent system that decomposes the task into verification, component extraction, formulation, and code generation. This modular approach enables validation at each stage while maintaining interpretability throughout the solution process. To systematically evaluate this method, we create real-world decision-making problems with varying complexity from different sources and evaluate our pipeline on five different large language models. Experiments demonstrate that algorithmic decomposition significantly enhances LLMs' effectiveness in solving finite-horizon MDPs, highlighting the necessity and benefits of structured reasoning over direct unstructured solution generation.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: AI/LLM Agents,NLP Applications,Generation and Automated Evaluation

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 6953

Loading