Minimal Cost Complexity Pruning of Meta-Classifiers

Andreas L. Prodromidis, Salvatore J. Stolfo

1999 (modified: 16 Jul 2019)AAAI/IAAI 1999Readers: Everyone

Abstract: Integrating multiple learned classication models (classiers) computed over large and (physically) distributed data sets has been demonstrated as an eective approach to scaling inductive learning techniques, while also boosting the accuracy of individual classiers. These gains, however, come at the expense of an increased demand for run-time system resources. The nal ensemble meta-classier may consist of a large collection of base classiers that require increased memory resources while also slowing down classication throughput. To classify unlabeled instances, predictions need to be generated from all base-classiers before the meta-classier can produce its nal classication. The throughput (prediction rate) of a metaclassier is of signicant importance in real-time systems, such as in e-commerce or intrusion detection. This extended abstract describes a pruning algorithm that is independent of the combining scheme and is used for discarding redundant classiers without degrading the overall predictive performance of the pruned metaclassier. To determine the most eective base classiers, the algorithm takes advantage of the minimal costcomplexity pruning method of the CART learning algorithm (Breiman et al. 1984) which guarantees to nd the best (with respect to misclassication cost) pruned tree of a specic size (number of terminal nodes) of an initial unpruned decision tree. An alternative pruning method using Rissanen’s minimum description length is described in (Quinlan & Rivest 1989).

0 Replies