Abstract: The process of discovering frequent patterns from large semistructured data repositories is one of the hardest categories of tree mining problems, since it involves the discovery of unordered embedded tree patterns. Existing work has focused primarily on the discovery of ordered, induced trees. This work proposes a divide-and-conquer algorithm called WTIMiner to discover the complete set of frequent unordered embedded subtrees. The algorithm successfully reduces the complexity of pattern matching and counting problem that a regular tree mining algorithm faces. Experimental results demonstrate the efficiency and scalability of WTIMiner in terms of both time and space
0 Replies
Loading