Keywords: Optimization, Regression trees, Newton method, Convergence
Abstract: Oblique decision trees combine the transparency of trees with the power of multivariate decision boundaries—but learning high-quality oblique splits is NP-hard, and practical methods still rely on slow search or theory-free heuristics.
We present the Hinge Regression Tree (HRT), which reframes each split as a non-linear least-squares problem over two linear predictors whose max/min envelope induces ReLU-like expressive power.
The resulting alternating fitting procedure is exactly equivalent to a damped Newton (Gauss–Newton) method within fixed partitions.
We analyze this node-level optimization and, for a backtracking line-search variant, prove that the local objective decreases monotonically and converges; in practice, both fixed and adaptive damping yield fast, stable convergence and can be combined with optional ridge regularization.
We further prove that HRT’s model class is a universal approximator with an explicit $O(\delta^2)$ approximation rate, and show on synthetic and real-world benchmarks that it matches or outperforms single-tree baselines with more compact structures.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 15408
Loading