An explainable prediction of shale gas ultimate recovery based on Tree-Based ensemble machine learning and Shapley additive explanations

Zheyuan Zhang, Min Pang, Zhaoming Zhou, Yichang Zhang

Published: 2025, Last Modified: 01 Apr 2026Appl. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Accurate prediction of Estimated Ultimate Recovery (EUR) and identification of key engineering and geological parameters are essential for the large-scale and efficient development of shale gas. Machine learning (ML) algorithms have proven to be effective data-driven approaches for this purpose. However, individual ML algorithms are often susceptible to issues such as outliers in the data and correlations among feature variables, which can lead to deviations between predicted results and actual outcomes. One approach in ensemble machine learning is the stacked model, which can significantly enhance the prediction accuracy of EUR for shale gas. This study presents an intelligent method for predicting the EUR of shale gas using stacked ensemble learning techniques. This method integrates RF, AdaBoost, XGBoost, CatBoost, and LightGBM as base learners, with LR as the meta-learner. The hyperparameters of each model are determined using the Tree-structured Parzen Estimator (TPE) optimization algorithm. The proposed model is then compared with individual machine-learning models and other stacking combinations to assess its performance. The Generated Stacking model demonstrated the best performance in EUR prediction, with an RMSE of 0.8008, an R² of 0.9069, and a MAPE of 5.25%. It was then validated using a real dataset from the Sichuan Weiyuan field, where the model exhibited strong generalization capability. Finally, the Shapley additive explanation method was employed to reveal the impact and interrelationships of various feature factors on the EUR of shale gas, as well as to analyze the dependencies and interactions among these factors. The results further indicate that Proppant Loading is the primary factor influencing EUR, followed by Lateral Length and Well Spacing. This work provides valuable insights for optimizing production planning to maximize the EUR of shale gas.
Loading