Obscured-ensemble models for genomic prediction

Rounak Saha, Xuan Guo, Amir Morshedian, Jia Sun, Robert Duncan, Mike Domaratzki

Published: 14 Nov 2025, Last Modified: 27 Jan 2026PLOS OneEveryoneRevisionsCC BY-SA 4.0

Abstract: Genomic Prediction (GP) uses dense whole-genome marker sets from lines of a crop to predict agronomic traits for untested genotypes. In recent years, deep learning (DL) approaches for genomic prediction have demonstrated state-of-the-art results. However, substantial variation exists in DL outcomes for GP as the success of DL is dependent on the architecture of the model used, as well as the amount of data available and the population structure of the individuals in the training set. In this paper, we consider an obscured model for GP, where the model is not provided with genomic content. The obscured model was intended to evaluate the possibility of so-called shortcut learning in GP.We conclude that we can perform GP using the obscured model with only 20% of the obscured markers from each reference genotype. This selective feature usage significantly enhances the efficiency of our model without compromising accuracy. By eliminating markers, we demonstrate that the model is not relying on linkage to perform shortcut learning. Further, we consider a deep learning ensemble method for genomic prediction based on the obscured model. The ensemble model we develop here shows success as a method for GP by using the similarity to each of the elements of a training set of genotypes, as well as the performance of the genotypes. We evaluate the obscured ensemble model for GP. We demonstrate that the obscured ensemble model is successful even with a limited number of genotypes used for prediction. Further, random selection of a subset of genotypes is sufficient to ensure successful performance.

External IDs:doi:10.1371/journal.pone.0334239