FDataCollector: A Blockchain Based Friendly Web Data Collection System

Published: 01 Jan 2021, Last Modified: 26 Jul 2025MSN 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In the last decade, a growing number of people use web crawlers to collect the data from the Internet for data analysis. The web crawlers greatly increase the workload of web servers and hence hinder normal accesses of the websites located in the servers. The accesses from web crawlers also affect the effectiveness of web mining, which assumes that the accesses are all from normal users. Moreover, the un-licensed collection of data from websites are often prohibited by laws and regulations of government and commercial organizations. To restrict the data collection from web crawlers, currently anti-crawler technique is applied to the websites. The behaviors of web crawler are recognized and their accesses are denied. This overcome the aforementioned problem, however, become a big obstacle for data exchange, considering that the large volume of data in the Internet could be useful for many data analysis applications. The dilemma of data collection using web crawlers and anti-crawler techniques demand a better solution. In this study, we propose to a friendly data sharing system FDataCollector to allow the data collection and also alleviate the workload of web servers by using blockchain techniques. We first make the data uploaded to the data sharing system by a few trustful users and then sell to public users in a traceable and P2P sharing way. The other accesses of web web crawlers are prohibited. On the user side, this design not only enable a convenient search of data but also improve the download efficiency. On the data holder side, this traceable and benefit way encourages them to share the data. We implement the system to demonstrate our idea. The results show that the system has high efficiency even when many transactions occur at the same time.
Loading