Parallel High Utility Itemset Mining

Gaojuan Fan, Huaiyuan Xiao, Chongsheng Zhang, George Almpanidis, Philippe Fournier-Viger, Hamido Fujita

2022 (modified: 23 Dec 2022)IEA/AIE 2022Readers: Everyone

Abstract: Association rule mining is a popular data mining task for finding relationships between values from the itemsets that co-occur frequently in a transactional database. Association rule mining has many applications but the “support-confidence” framework it depends on is inadequate for many cases. In recent years, a generalised task called high utility itemset mining (HUIM) has gained much popularity; it aims at discovering itemsets that yield a high revenue as measured by a utility function. However, when facing large data volumes, the running time of state-of-the-art HUIM algorithms often grows exponentially. In this work, we investigate parallel HUIM algorithms (PHUIM) and adapt two state-of-the-art sequential HUIM algorithms for parallel processing based on the Apache Spark in-memory data processing platform. Extensive experiments on several benchmark and synthetic datasets show that the proposed methods improve considerably the efficiency of the baseline HUIM algorithms.

0 Replies