Random forest modelling of remotely sensed land cover data to identify crime hot spots in urban areas
Abstract: This study evaluates the effectiveness of integrating high-resolution remote sensing (RS) data with machine learning (ML) techniques to identify criminogenic environments in urban areas. We employ an unsupervised ISO clustering method to classify land cover from aerial imagery, thereby capturing fine-scale environmental details that are often overlooked in traditional analyses. These clusters are linked to both crime and non-crime events through a presence/absence (case–control) framework, a methodology adapted from species distribution studies, which enables a micro-environmental examination of crime locations. In addition to RS-derived land-cover predictors, the study incorporates socio-economic and demographic variables, as well as a centrality indicator that proxies the intensity of urban activity. A Random Forest classifier is utilized to model the likelihood of street theft incidents based on these predictors. The model achieves robust performance, with an F1-score of 0.88 ± 0.03 as determined by K-fold cross-validation. To enhance model interpretability, SHapley Additive exPlanations (SHAP) is applied. The findings of this research demonstrate that integrating RS data with ML techniques offers a valuable tool for identifying and mapping criminogenic environments. The resulting risk map of Stockholm highlights key urban areas with elevated street theft risk, offering guidance for targeted crime prevention and urban planning strategies. While our workflow simplifies some technical steps compared to other RS + ML pipelines, it still requires GIS and ML competence to implement effectively. This approach reduces, but does not eliminate, sensitivity to spatial unit choice (MAUP) and spatial data dependencies.
External IDs:doi:10.1007/s44327-025-00171-2
Loading