Abstract: Decision forests, in particular random forests and gradient boosting trees have demonstrated state-of-the-art accuracy compared to other methods in many supervised learning scenarios. Forests dominate other methods in tabular data, that is, when the feature space is unstructured, so that the signal is invariant to a permutation of the feature indices. However, in structured data lying on a manifold—such as images and time-series—deep networks, specifically convolutional deep networks (ConvNets), tend to outperform forests. We conjecture that it is in part due to networks not simply analyzing feature magnitudes, but also their indices. In contrast, naïve forest implementations fail to explicitly consider feature indices. A recent approach demonstrates that forests, for each node, implicitly sample a random matrix from some specific distribution. These forests, like some networks, learn by partitioning the feature space into convex polytopes corresponding to linear functions. We build on that approach with Manifold Oblique Random Forests (Morf) that chooses distributions in a manifold-aware fashion to incorporate feature locality. Morf runs fast and maintains interpretability and theoretical justification. Morf also has excellent empirical classification performance on simulated data and real images and multivariate time-series. It outperforms non-neural network approaches that ignore feature space structure and challenges the performance of ConvNets in some cases.
0 Replies
Loading