Abstract: With the recent advancement in technology and a vast amount of information available, research in pattern mining has started to attract more attention. Specifically, various techniques have been developed for clickstream mining, which is a specific type of sequential pattern mining, to discover the underlying patterns from the Internet user clickstream. Due to the complexity of clickstream patterns, many of the existing works applied sequential pattern algorithms to generate an exponential candidate space of patterns with respect to patterns letters. Further, those patterns were generated in a noiseless environment. To address this problem, we focus on a nonoverlapping clickstream pattern mining task with noisy interleaving clicks between the clickstream patterns letters. Additionally, we are interested in labeling the extracted patterns in the user browsing history. A modified suffix tree is proposed to extract those patterns with the exact occurrence in the user noisy database. Following this, we model the user browsing behavior via a Hidden Markov Model (HMM) to capture the dependencies between the extracted patterns and then predict the future clickstream patterns. Experimental results on both real-life and synthetic datasets show that our proposed algorithms outperform the state-of-the-art benchmarks in efficiency and prediction accuracy.
External IDs:dblp:conf/globecom/AlamoudiFML22
Loading