TL;DR: Decision trees with geodesically convex splits extend to product space manifolds
Abstract: Decision trees (DTs) and their random forest (RF) extensions are workhorses of classification and regression in Euclidean spaces. However, algorithms for learning in non-Euclidean spaces are still limited. We extend DT and RF algorithms to product manifolds: Cartesian products of several hyperbolic, hyperspherical, or Euclidean components. Such manifolds handle heterogeneous curvature while still factorizing neatly into simpler components, making them compelling embedding spaces for complex datasets. Our novel angular reformulation respects manifold geometry while preserving the algorithmic properties that make decision trees effective. In the special cases of single-component manifolds, our method simplifies to its Euclidean or hyperbolic counterparts, or introduces hyperspherical DT algorithms, depending on the curvature. In benchmarks on a diverse suite of 57 classification, regression, and link prediction tasks, our product RFs ranked first on 29 tasks and came in the top 2 for 41. This highlights the value of product RFs as straightforward yet powerful new tools for data analysis in product manifolds. Code for our method is available at https://github.com/pchlenski/manify.
Lay Summary: We are interested in how decision trees partition the input space, with an eye towards adapting decision trees and random forests to more complicated non-Euclidean geometries. We claim that decision trees and random forests are successful for Euclidean inputs because they have 5 desirable properties:
1. **Continuity:** Their leaves are single regions of space, never islands;
2. **(Geodesic) convexity:** For each leaf, the shortest path between any two points in that leaf stays completely inside that leaf;
3. **Equidistance:** Splits fall exactly between the two nearest points to either side;
4. **Efficiency:** We consider a manageable number of split candidates ($\mathcal{O}(nd)$ to be exact); and
5. **Speed:** We can evaluate each split in constant time.
We show a method to extend decision trees while preserving these 5 properties to mixed-curvature product manifolds: non-Euclidean spaces built up from simple components like spheres, hyperboloids, and even Euclidean subspaces. We show how many kinds of problems can be viewed as classification or regression on product manifolds, and then use this to benchmark our method on 57 different problems and find that it performs better than other methods on a majority of the benchmarks.
Link To Code: https://github.com/pchlenski/manify
Primary Area: General Machine Learning->Representation Learning
Keywords: representation learning, non-euclidean geometry, decision trees, random forests, hyperbolic geometry, hyperspherical geometry
Submission Number: 8019
Loading