Abstract: Crucial to answering economic, social and political questions facing our society, data tends to be diverse and distributed through sites across the Internet. The creation of tools to integrate and analyze it is of paramount interest. Yet the automation of these processes continues to be a great challenge. Our work rests on the observation that a high number of public data sources for domains ranging from economic to demographic, although of complex structure, often share key similarities. One of these similarities is the presence of time and location, two core attribute types. Our proposed Data Integration through Object Modelling framework or DIOM tackles this problem of automating data integration from a variety of public websites by abstracting key features of multi-dimensional tables and interpreting them in the context of a spatial and temporal model. Our preliminary experimental results on real world data sets from heterogeneous public data sources show accuracy of over 93% in DIOM's entity identification.
0 Replies
Loading