A conditional masked autoencoder network based on efficient multiple-head self-attention for characterizing heterogeneous reservoirs

Published: 01 Jan 2026, Last Modified: 17 Sept 2025Expert Syst. Appl. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Conditioning data such as wells, outcrops, and seismic data, stand as pivotal elements for the characterization of realistic and trustworthy reservoir structures and properties. It has been a longstanding challenge to establish geological models conforming to multiple-scale spatial features based on conditioning hard and/or soft data. To address the difficulties of conditional simulation and capturing complex multiple-scale spatial patterns in reservoir characterization, we propose a conditional masked autoencoder network for characterizing heterogeneous reservoirs based on an efficient multiple-head self-attention mechanism, which is named EMSA-CMAE. This method seamlessly integrates semantic inpainting from computer vision with reservoir characterization and facilitates the natural embedding of conditioning data within the masked autoencoder network. Moreover, the utilization of an efficient multiple-head self-attention (EMSA) module significantly reduces the computational overhead in EMSA-CMAE. We employ three sets of training images to verify the availability and robustness of EMSA-CMAE. In experiments, the masking ratio is set to 90 %, and statistical methods, such as histograms, RMSE, MDS maps, and variograms, are employed to measure attribute proportion, pixel error, and spatial heterogeneity of the realizations. The average RMSE across all three datasets is 0.17, and the comparative analysis reveals that the employing of the semantic repair strategy and the EMSA module enhances the ability of EMSA-CMAE to reproduce the spatial structures and properties of heterogeneous reservoirs.
Loading