A Parallel Implementation of Idea Graph to Extract Rare Chances from Big Data

Published: 01 Jan 2014, Last Modified: 18 Mar 2025ICDM Workshops 2014EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In current days, data tend to become much bigger than before, and the distributed computing system is an prevalent option to deal with them. As one of powerful tools, MapReduce framework provides a cheap and efficient way to write parallel programs to run on distributed computing systems. Chance discovery (CD) is an extension of data mining, where chance refers to rare but important events or situations. Idea Graph is an efficient algorithm proposed to detect chances. However, the traditional implementation of Idea Graph is sequential, and its performance encounters some bottlenecks when dealing with big data. In this paper, we propose a parallel implementation of Idea Graph using MapReduce to better meet with the challenge of big data. First, we introduce the MapReduce framework, and then Idea Graph is introduced in brief. After that, we present the details on how we design the parallel Idea Graph implementation. In the end of the paper, several experiments are conducted to evaluate the proposed implementation. The experimental results demonstrate the validation of the proposed implementation and its better performance as compared with that of sequential Idea Graph implementation when handling big data.
Loading