Adaptive Non-disjoint Discretization for Tree Augmented Naive Bayes

Published: 2025, Last Modified: 26 Jan 2026ADMA (4) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As one of the most prominent improvements to Naive Bayes, Tree Augmented Naive Bayes (TAN) allows each attribute to depend on at most one other attribute in addition to the class label, and has exhibited excellent classification performance across various real-world applications. Before constructing the TAN model, discretization is often employed as a pre-processing step that transforms numerical attributes into nominal ones. However, existing discretization methods are typically either model-agnostic or specifically tailored for standard Naive Bayes, which fails to account for the complex attribute dependencies present in TAN. Furthermore, most existing discretization methods require to pre-define a fixed number of intervals or roughly compute one based on data size, and few of them determine it adaptively for each dataset based on their classification performance. To address these limitations, this paper proposes an Adaptive Non-Disjoint Discretization (ANDD) method for TAN. In the training phase, ANDD adds a scaling factor to search for an optimal interval number for each dataset. In the classification phase, ANDD simultaneously considers the atomic intervals of both parent and child attributes, assigning different weights for each atomic interval to enhance the accuracy of conditional probability estimation. Extensive experiments on 49 benchmark datasets validate the superiority of ANDD.
Loading