Abstract: We study the problem of building a regression tree with relatively small size, which maximizes the Kendall’s tau coefficient between the anomaly scores of a source anomaly detection algorithm and those predicted by our regression tree. We consider a labeling function which assigns to each leaf the inverse of its size, thereby providing satisfactory explanations when comparing examples with different anomaly scores. We show that our approach can be used as a post-hoc model, i.e. to provide global explanations for an existing anomaly detection algorithm. Moreover, it can be used as an in-model approach, i.e. the source anomaly detection algorithm can be replaced all together. This is made possible by leveraging the off-the-shelf transparency of tree-based approaches and from the fact that the explanations provided by our approach do not rely on the source anomaly detection algorithm. The main technical challenge to tackle is the efficient computation of the Kendall’s tau coefficients when determining the best split at each node of the regression tree. We show how such a coefficient can be computed incrementally, thereby making the running time of our algorithm almost linear (up to a logarithmic factor) in the size of the input. Our approach is completely unsupervised, which is appealing in the case when it is difficult to collect a large number of labeled examples. We complement our study with an extensive experimental evaluation against the state-of-the-art, showing the effectiveness of our approach.
Loading