Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

Mohammed Javeed Zaki

Published: 2005, Last Modified: 28 Jan 2025IEEE Trans. Knowl. Data Eng. 2005EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Mining frequent trees is very useful in domains like bioinformatics, Web mining, mining semistructured data, etc. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TREEMINER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TREEMINER with a pattern matching tree mining algorithm (PATTERNMATCHER), and we also compare it with TREEMINERD, which counts only distinct occurrences of a pattern. We conduct detailed experiments to test the performance and scalability of these methods. We also use tree mining to analyze RNA structure and phylogenetics data sets from bioinformatics domain.