Abstract: Highlights•A testing-based decision tree, SigDT, is presented for clustering categorical data.•The split point evaluation issue is formulated as a multiple testing problem.•SigDT conducts clusterability prediction and cluster analysis simultaneously.•SigDT determines the number of clusters automatically via significance testing.
Loading