Clustering high dimensional data: examining differences and...

Clustering high dimensional data: examining differences and commonalities between subspace clustering and text clustering - a position paper

Hans-Peter Kriegel, Eirini Ntoutsi

Published: 2013, Last Modified: 07 Mar 2025SIGKDD Explor. 2013EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The goal of this position paper is to contribute to a clear understanding of the commonalities and differences between subspace clustering and text clustering. Often text data is foisted as an ideal fit for subspace clustering due to its high dimensional nature and sparsity of the data. Indeed, the areas of subspace clustering and text clustering share similar challenges and the same goal, the simultaneous extraction of both clusters and the dimensions where these clusters are defined. However, there are fundamental differences between the two areas w.r.t object feature representation, dimension weighting and incorporation of these weights in the dissimilarity computation. We make an attempt to bridge these two domains in order to facilitate the exchange of ideas and best practices between them.