Active Learning for Decision Trees with Provable Guarantees

ICLR 2026 Conference Submission14971 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Label complexity, Theory of Active Learning, Theory of Decision Tree, Disagreement coefficient
TL;DR: We provides the first analysis of the disagreement coefficient for decision trees, and also we develop a general active learning algorithm for binary classification that provides a multiplicative error guarantee.
Abstract: This paper advances the theoretical understanding of active learning label complexity for decision trees as binary classifiers. We make two main contributions. First, we provide the first analysis of the **disagreement coefficient** for decision trees—a key parameter governing active learning label complexity. Our analysis holds under two natural assumptions required for achieving polylogarithmic label complexity: (i) each root-to-leaf path queries distinct feature dimensions, and (ii) the input data has a regular, grid-like structure. We show these assumptions are essential, as relaxing them leads to polynomial label complexity. Second, we present the first general active learning algorithm for binary classification that achieves a **multiplicative error guarantee**, producing a $(1+\epsilon)$-approximate classifier. By combining these results, we design an active learning algorithm for decision trees that uses only a **polylogarithmic number of label queries** in the dataset size, under the stated assumptions. Finally, we establish a label complexity lower bound, showing our algorithm’s dependence on the error tolerance $\epsilon$ is close to optimal.
Primary Area: learning theory
Submission Number: 14971
Loading