Abstract: To harness the rich amount of information available on the Web today, many organizations start to aggregate public (and private) data to derive new knowledge bases. A fundamental challenge in constructing an accurate integrated knowledge repository from different data sources is to understand how facts across different sources are related to one another over time. This challenge, referred to as the temporal record linkage problem, goes far beyond the traditional record linkage problem as it requires a fine-grained analysis of how two facts are temporally related if they both refer to the same entity. In this paper, we present a new solution for understanding how two facts may be temporally related and exploit the knowledge to profile how entities evolve over time. Our solution makes use of a novel transition model which captures sophisticated patterns of value transitions. Specifically, our transition model captures the probability that an entity may change to a particular attribute value after some time period. This transition model can be considered jointly with various source quality metrics to fine-tune how records should be temporally linked to entities. In particular, we showcase how the freshness of data sources can be built into a source-aware temporal matching algorithm that jointly considers the value transitions and the freshness of data sources to link temporal records to entities in the right time period. In this way, an increasingly complete and up-to-date entity profile can be derived as more and more temporal records are aggregated from different sources. Our suite of experimental results on real world datasets demonstrate that our proposed method is able to outperform the state-of-the-art techniques and build more complete profiles for entities by identifying their true matching temporal records at the right time period.
0 Replies
Loading