Parallel Sequence Mining on Shared-Memory Machines

Mohammed Javeed Zaki

Published: 2001, Last Modified: 28 Jan 2025J. Parallel Distributed Comput. 2001EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller suffix-based classes. Each class can be solved in main-memory using efficient search techniques and simple join operations. Furthermore, each class can be solved independently on each processor requiring no synchronization. However, dynamic interclass and intraclass load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup and excellent scaleup results.