Enhanced Adjacency-Constrained Hierarchical Clustering Using Fine-Grained Pseudo Labels
Abstract: Hierarchical clustering is able to provide partitions
of different granularity levels. However, most existing hierarchical
clustering techniques perform clustering in the original feature
space of the data, which may suffer from overlap, sparseness, or
other undesirable characteristics, resulting in noncompetitive performance.
In the field of deep clustering, learning representations
using pseudo labels has recently become a research hotspot. Yet
most existing approaches employ coarse-grained pseudo labels,
which may contain noise or incorrect labels. Hence, the learned
feature space does not produce a competitive model. In this paper,
we introduce the idea of fine-grained labels of supervised
learning into unsupervised clustering, giving rise to the enhanced
adjacency-constrained hierarchical clustering (ECHC) model. The
full framework comprises four steps. One, adjacency-constrained
hierarchical clustering (CHC) is used to produce relatively pure
fine-grained pseudo labels. Two, those fine-grained pseudo labels
are used to train a shallow multilayer perceptron to generate good
representations. Three, the corresponding representation of each
sample in the learned space is used to construct a similarity matrix.
Four, CHC is used to generate the final partition based on the
similarity matrix. The experimental results show that the proposed
ECHC framework not only outperforms 14 shallow clustering
methods on eight real-world datasets but also surpasses current
state-of-the-art deep clustering models on six real-world datasets.
In addition, on five real-world datasets, ECHC achieves comparable
results to supervised algorithms.
Loading