Adaptive and Parallel Data Acquisition from Online Big GraphsOpen Website

Published: 01 Jan 2018, Last Modified: 10 Feb 2024DASFAA (1) 2018Readers: Everyone
Abstract: Acquisition of contents from online big graphs (OBGs) like linked Web pages, social networks and knowledge graphs, is critical as data infrastructure for Web applications and massive data analysis. However, effective data acquisition is challenging due to the massive, heterogeneous, dynamically evolving properties of OBGs with unknown global topological structures. In this paper, we give an adaptive and parallel approach for effective data acquisition from OBGs. We adopt the ideas of Quasi Monte Carlo (QMC) and branch & bound methods to propose an adaptive Web-scale sampling algorithm for parallel data collection implemented upon Spark. Experimental results show the effectiveness and efficiency of our method.
0 Replies

Loading