Abstract: Within an organisation, the quality in big data is a cornerstone to operational, transactional processes and to the reliability of business analytics for decision making. In fact, as organizations are harnessing multi-sources data to rise the benefits of their business, the quality of data becomes important and crucial. This paper presents a new approach to query big data sources using Resource Description Framework (RDF) representation to ensure data quality by harvesting more relevant and complete query results. Our approach handles two important types of heterogeneity over multiple data sources: semantic heterogeneity and URI-based entity identification. It proposes (1) a semantic entity resolution method based on inference mechanism using rules to manage the misunderstanding of data, in real world entities (2) Data Quality enhancement using MapReduce-based query rewriting approach includes the entity resolution results to infer and adds implicit data into query results (3) a parallel combination of MapReduce jobs of saturation and query rewriting inferences to handle transitive and cyclic rules for a richer rules' expression language (4) experiments to assess the efficiency of the proposed approach over real big RDF data originating from insurance and synthetic data sets.
Loading