Gaussian Process Spatial Clustering

TMLR Paper2523 Authors

14 Apr 2024 (modified: 17 Sept 2024)Withdrawn by AuthorsEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Spatial clustering is a common unsupervised learning problem with many applications in areas such as public health, urban planning or transportation, where the goal is to identify clusters of similar locations based on regionalization as well as patterns in characteristics over those locations. Unlike standard clustering, a well-studied area with a rich literature including methods such as K-means clustering, spectral clustering, and hierarchical clustering, spatial clustering is a relatively sparse area of study due to inherent dierences between the spatial domain of the data and its corresponding covariates. In the case of our motivating example, the American Community Survey dataset, spatial dierences in census tract regions cannot be directly compared to dierences in participant survey responses to indicators such as employment status or income. As such, in this paper, we develop a spatial clustering algorithm called Gaussian Process Spatial Clustering (GPSC), which clusters functions between data leveraging the flexibility of Gaussian processes and extends it to the case of clustering geospatial data. We provide theoretical guarantees and demonstrate its capabilities to recover true clusters in several simulation studies and a real-world dataset to identify clusters of tracts in North Carolina based on socioeconomic and environmental indicators associated with health and cancer risk.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Initial submission was previously desk rejected due to a missing header in the paper format (submission number 2509), so there is no public URL available for this previous submission. The formatting has been corrected and we are now resubmitting.
Assigned Action Editor: ~Novi_Quadrianto1
Submission Number: 2523
Loading