Abstract: Frequent itemset mining is a Data Mining technique aiming to generate from a dataset new and interesting information under the form of sets of items. Several algorithms were already proposed, and successfully implemented and used such as Apriori, FP-Growth and Eclat, along with numerous improvements. These algorithms deal with two types of data layouts: horizontal and vertical; the former corresponds to the traditional layout (the individuals as rows and the items as columns) and it is more used due to its facility, but the latter brings important computation reductions. The standard frequent itemset mining algorithms have a high computational complexity and, given the available massive datasets, new approaches were proposed in the literature implementing mining algorithms in parallel, distributed, and lately Cloud Computing paradigms. In order to overcome the drawbacks related to the computational issues, in this paper, we propose, Apriori_V, a new parallel algorithm for frequent itemset mining from a vertical data layout that was implemented on the MapReduce platform. Apriori_V brings significant improvements related to (1) the use of the vertical data layout with an Apriori-like strategy allowing to reduce the number of operations due to the elimination of several Apriopri-specific tasks such as the pruning, and (2) decrease of the underlying complexity and thus the execution time.
0 Replies
Loading