Keywords: Decision Tree, Effort-To-Compress, Structural Impurity, Permutation Bagging, Machine Learning
Abstract: Decision Tree is a well understood Machine Learning model that is based on minimizing impurities in the internal nodes. The most common impurity measures are \emph{Shannon entropy} and \emph{Gini impurity}. These impurity measures are insensitive to the order of training data and hence the final tree obtained is invariant to a permutation of the data. This leads to a serious limitation in modeling data instances that have order dependencies. In this work, we use~\emph{Effort-To-Compress} (ETC) - a complexity measure, for the first time, as an impurity measure. Unlike Shannon entropy and Gini impurity, structural impurity based on ETC is able to capture order dependencies in the data, thus obtaining potentially different decision trees for different permutation of the same data instances (\emph{Permutation Decision Trees}). We then introduce the notion of {\it Permutation Bagging} achieved using permutation decision trees without the need for random feature selection and sub-sampling. We compare the performance of the proposed permutation bagged decision trees with Random Forest. Our model does not assume independent and identical distribution of data instances. Potential applications include scenarios where a temporal order is present in the data instances.
Submission Number: 2868
Loading