Sustainable Investment Decision-Making on Office Buildings using Reinforcement Learning and Large Language Models

Ziru Tao; Paul Baguley; Rashid Maqbool; Obuks Ejohwomu

Sustainable Investment Decision-Making on Office Buildings using Reinforcement Learning and Large Language Models

Ziru Tao, Paul Baguley, Rashid Maqbool, Obuks Ejohwomu

Published: 08 Oct 2025, Last Modified: 17 Oct 2025Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement learning, Cost engineering, Environmental, Social, and Governance (ESG), Markov decision process (MDP), Office buildings, Large language models (LLMs)

TL;DR: An RL–LLM cost-engineering framework that monetizes ESG to optimize life-cycle design–construction–operation decisions for office buildings, yielding lower energy/carbon and better NPV in US/UK case studies.

Abstract: This study develops a reinforcement learning (RL) framework to optimize lifecycle investment decisions for sustainable office buildings from a cost engineering perspective, translating Environmental, Social, and Governance (ESG) impacts into monetized drivers for decision support. Sequential choices across design, construction, and operation are modeled as a Markov Decision Process (MDP) and trained with a Deep Q-Network, aligning the discount factor with the economic discount rate to avoid double counting. A large language model (LLM), ChatGPT-5, is used to extract parameters from unstructured guidance and to generate stakeholder-facing explanations of learned policies. Across two case studies in the United States and the United Kingdom, the RL strategy achieves 37.5--45.0\% lower annual energy use and 31.0--36.9\% lower total lifecycle carbon than conventional practice. Despite a 4--6\% higher initial cost, it reduces financial lifecycle cost by 0.42million United States Dollar and 1.01 million Great British Pound and reduces societal cost NPV, i.e., monetized carbon and productivity effects, by 3.50 million United States Dollar and 3.00 million Great British Pound. Results remain robust under $\pm 20\%$ parameter noise and a $+2^\circ\mathrm{C}$ climate scenario. Limitations include reliance on secondary estimates for social valuation, simplified transition dynamics, and automated evaluation of LLM explanations; future work will incorporate expert blind review and real project validation.

Supplementary Material: zip

Submission Number: 213

Loading