Semantic-Based and Entity-Resolution Fusion to Enhance Quality of Big RDF Data

Published: 01 Jan 2021, Last Modified: 07 Aug 2024IEEE Trans. Big Data 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Within an organisation, the quality in big data is a cornerstone to operational, transactional processes and to the reliability of business analytics for decision making. In fact, as organizations are harnessing multi-sources data to rise the benefits of their business, the quality of data becomes important and crucial. This paper presents a new approach to query big data sources using Resource Description Framework (RDF) representation to ensure data quality by harvesting more relevant and complete query results. Our approach handles two important types of heterogeneity over multiple data sources: semantic heterogeneity and URI-based entity identification. It proposes (1) a semantic entity resolution method based on inference mechanism using rules to manage the misunderstanding of data, in real world entities (2) Data Quality enhancement using MapReduce-based query rewriting approach includes the entity resolution results to infer and adds implicit data into query results (3) a parallel combination of MapReduce jobs of saturation and query rewriting inferences to handle transitive and cyclic rules for a richer rules' expression language (4) experiments to assess the efficiency of the proposed approach over real big RDF data originating from insurance and synthetic data sets.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview