Abstract: Spatial data is generated daily from numerous sources such as GPS-enabled devices, consumer applications (e.g., Uber, Strava), and social media (e.g., location-tagged posts). This exponential growth in spatial data is driving the development of efficient spatial data processing systems. In this study, we enhance spatial indexing with a machine-learned search technique developed for single-dimensional sorted data. Specifically, we partition spatial data using six traditional spatial partitioning techniques and employ machine-learned search within each partition to support point, range, distance, and spatial join queries. By instance-optimizing each partitioning technique, we demonstrate that: (i) grid-based index structures outperform tree-based ones (from 1.23x to 2.47x), (ii) learning-enhanced spatial index structures are faster than their original counterparts (from 1.44x to 53.34x), (iii) machine-learned search within a partition is 11.79% -39.51% faster than binary search when filtering on one dimension, (iv) the benefit of machine-learned search decreases in the presence of other compute-intensive operations (e.g. scan costs in higher selectivity queries, Haversine distance computation, and point-in-polygon tests), and (v) index lookup is the bottleneck for tree-based structures, which could be mitigated by linearizing the indexed partitions.
Loading