Iterative matrix correlation for bisection clustering

Byron J. Gao, Robert Tung, Yong Yang

Published: 01 Jan 2017, Last Modified: 26 Jan 2024IEEE BigData 2017Readers: Everyone

Abstract: We introduce and theoretically study the convergence behavior of iterative matrix correlation computation and show how it can be leveraged to derive a novel bisection clustering algorithm with unique characteristics. A correlation matrix is a symmetric n × n matrix for n vectors, where the (i, j)-th entry is the Pearson correlation coefficient between vectors i and j. We observe that in general cases iterative update of the correlation matrix leads to its convergence where all entries are either 1 or -1. Moreover, the same convergence behavior holds if we select a pre-determined subset of columns from the correlation matrix in each iteration for the next iteration of correlation computation. We mathematically prove this observation, analyze its convergence behavior, and propose an efficiency improvement technique. While this observation is significant in its own right and may have many applications, we focus on how to apply it to achieve bisection clustering. Clustering is a fundamental data mining task. Bisection clustering is particularly important because it can be used as a building block to construct hierarchical clustering and arbitrary k-clustering. The derived algorithm, which we call Corbis, works in a fundamentally different way from existing ones with preferable characteristics and comparative advantages especially for high dimensional data. It can be an important addition to the arsenal of clustering algorithms.

0 Replies