Benchmarking Hierarchical and Spectral Clustering for Geochemical Baseline and Anomaly Detection in Hyper-Arid Soils of Northern Chile

Georginio Ananganó-Alvarado, Brian Keith-Norambuena, Elizabeth J. Lam, Ítalo L. Montofré, Angélica Flores, Carolina Flores, Jaume Bech

Published: 11 Nov 2025, Last Modified: 05 Jan 2026MineralsEveryoneRevisionsCC BY-SA 4.0
Abstract: Establishing robust geochemical baselines in the hyper-arid Atacama Desert remains challenging because of extreme climatic gradients, polymetallic mineralisation, and decades of intensive mining. To disentangle natural lithogeochemical signals from anthropogenic inputs, a region-wide, multi-institutional soil dataset (1404 samples; 32 elements) was compiled. The analytical workflow integrated compositional data analysis (CoDA) with isometric log-ratio transformation (ILR), principal component analysis (PCA), robust principal component analysis (RPCA), and consensus anomaly detection via hierarchical (HC) and spectral clustering (SC), applied both with and without spatial coordinates to capture compositional structure and geographic autocorrelation. Optimal cluster solutions differed among laboratory subsets (k = 2–17), reflecting instrument-specific biases. The dual workflows flagged 76 (geochemical-only) and 83 (geo-spatial) anomalies, of which 33 were jointly identified, yielding high-confidence exclusions. Regional baselines for 13 priority elements were subsequently computed, producing thresholds such as As = 66.9 mg · kg−1, Pb = 53.6 mg · kg−1, and Zn = 166.8 mg · kg−1. Incorporating spatial variables generated more coherent, lithology-aligned clusters without sacrificing sensitivity to geochemical extremes (Jaccard index = 0.26). These findings demonstrate that a reproducible, compositional-aware machine learning workflow can separate overlapping geogenic and anthropogenic signatures in heterogeneous terrains. The resulting baselines provide an operational reference for environmental monitoring in northern Chile and a transferable template for other arid mining locations.
Loading