Unifying Knowledge from Diverse Datasets to Enhance Spatial-Temporal Modeling: A Granularity-Adaptive Geographical Embedding Approach
Abstract: Spatio-temporal forecasting provides potential for discovering evolutionary patterns in geographical scientific data. However, geographical scientific datasets are often manually collected across studies, resulting in limited time spans and data scales. This hinders existing methods that rely on rich historical data for individual entities. In this paper, we argue that heterogeneous datasets from different studies can provide complementary insights into the same underlying system, helping improve predictions for geographical entities with limited historical data. To this end, we propose a Segment Quadtree Geographical Embedding Framework (SQGEF). SQGEF integrates knowledge from datasets with varied target entities, time spans, and observation variables to learn unified representations for multi-granularity entities—including those absent during training. Specifically, we propose a novel data structure, Segment Quadtree, that flexibly accommodates entities of varying granularities. SQGEF not only captures multi-level interactions from grid data but also extracts nested relationships and human-defined boundaries from diverse entities, enabling a comprehensive understanding of complex geographical structures.
Experiments on real-world datasets demonstrate that SQGEF effectively represents unseen geographical entities and enhances performance for various models.
Lay Summary: Traditionally, when we want to predict attributes of a geographical entity like a province or city, we analyze historical data from that same entity. However, especially in scientific data fields, it's often difficult to obtain historical records for the specific entity we want to predict. While we may lack data for our target entity, we frequently have access to many other datasets from the same region that contain historical records of related entities or sub-areas.
We propose a method that can fuse these different types of regional datasets together and store them in a novel data structure called Segment Quadtree. This structure enables us to query information about entities that didn't appear in our original datasets, significantly improving our prediction capabilities.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Applications->Chemistry, Physics, and Earth Sciences
Keywords: Geographical Modeling
Submission Number: 15394
Loading