Automated "E"-aware data processing for construction ESG using building information modeling and large language model

Published: 2026, Last Modified: 07 Nov 2025Adv. Eng. Informatics 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Environmental, Social and Governance (ESG) assessment and disclosure are critical for architecture, engineering, and construction (AEC) companies to market their financial results, reputational position, and compliance with regulatory requirements. Within this framework, the environmental (“E”) dimension presents unique and formidable data management challenges distinct from social and governance aspects. Specifically, the complex interplay of quantitative metrics and qualitative descriptions within ‘E’-aware data (e.g., measurable resource consumption alongside descriptive material sourcing practices, emissions figures coupled with compliance narratives), amplified by its sheer volume and the persistent ambiguity of environmental indicators and reporting standards, poses significant obstacles to effective ‘E’-aware data disclosure. Large Language Models (LLMs) possess inherent advantages in processing such complex environmental information due to their proficient language processing and generalization capabilities. Nonetheless, the development of LLM-based methods explicitly tailored for environmental data management within the construction sector remains underexplored. To this end, this study introduces an automated, LLM-enhanced “E”-aware data processing approach for the construction industry. The innovation of this framework is threefold. First, fifteen “E”-aware indicators are meticulously crafted to align with the specific needs of construction entities. Second, an “E”-aware algorithm, integrated within the Building Information Modeling (BIM) framework, is devised to streamline the aggregation and quantification of environmental data. Third, an LLM-enhanced complex structured data processing mechanism using retrieval augmented generation (RAG) is proposed to facilitate the efficient processing of “E”-aware data pertinent to construction projects. An illustrative case study is employed to validate the feasibility and efficacy of the proposed methodology. The results demonstrate that the developed automated RAG-LLM enhanced framework significantly advances current practice by: (1) enabling standardized “E”-aware data specifications and source mapping; (2) drastically reducing processing time for large-scale ESG documentation (saving 64.4% of time); and (3) providing a robust solution for handling multi-source, multi-format data, thereby enhancing the efficiency and reliability of environmental management and ESG disclosure in the AEC industry.
Loading