MaskSDM with Shapley values to improve flexibility, robustness and explainability in species distribution modelling
Abstract: Species distribution models (SDMs) play a vital role in biodiversity research, conservation planning and ecological niche modelling by predicting species distributions based on environmental conditions. The selection of predictors is crucial, strongly impacting both model accuracy and how well the predictions reflect ecological patterns. To ensure meaningful insights, input variables must be carefully chosen to match the study objectives and the ecological requirements of the target species.
However, existing SDMs, including both traditional and deep learning-based approaches, often lack key capabilities for variable selection: (i) flexibility to choose relevant predictors at inference without retraining; (ii) robustness to handle missing predictor values without compromising accuracy; and (iii) explainability to interpret and accurately quantify each predictor's contribution.
To overcome these limitations, we build upon and extend MaskSDM, a novel deep learning-based SDM that enables flexible predictor selection by employing a masked training strategy, in which input variables are randomly hidden during training to simulate missing or ignored predictors. This approach allows the model to make predictions with arbitrary subsets of input variables while remaining robust to missing data. It also provides a clearer understanding of how adding or removing a given predictor affects model performance and predictions. Furthermore, we introduce a new method for computing Shapley values with MaskSDM, enabling precise assessments of predictor contributions and improving upon traditional approximations.
We conduct an extensive evaluation of MaskSDM on the global sPlotOpen dataset, modelling the distributions of 12,738 plant species. Our results show that MaskSDM outperforms imputation-based methods and closely approximates models trained on specific subsets of variables, while also providing key local and global insights into predictor contributions through more accurate Shapley value estimation. These findings underscore MaskSDM's potential to increase the applicability and adoption of SDMs, laying the groundwork for developing foundation models in SDMs that can be readily applied to diverse ecological applications.
Loading