Keywords: information theoretic clustering, timeseries clustering, clustering, information theory, mutual information, KL-Divergence, dual form, deep learning
Abstract: Information-theoretic clustering is one of the most promising and principled approaches to finding clusters with minimal apriori assumptions. The key criterion therein is to maximize the mutual information between the data points and their cluster labels. We instead propose to maximize the Kullback–Leibler divergence between the underlying distributions associated to clusters (referred to as cluster distributions). We show it to be equivalent to optimizing over the mutual information criterion while simultaneously maximizing cross entropy between the cluster distributions. For practical efficiency, we propose to empirically estimate the objective of KL-D between clusters in its dual form leveraging deep neural nets as a dual function approximator. Remarkably, our theoretical analysis establishes that estimating the divergence measure in its dual form simplifies the problem of clustering to one of optimally finding k − 1 cut points for k clusters in the 1-D dual functional space. Overall, our approach enables linear-time clustering algorithms with theoretical guarantees of near-optimality, owing to the submodularity of the objective. We show the empirical superiority of our approach w.r.t. current state-of-the-art methods on the challenging task of clustering noisy timeseries as observed in domains such as neuroscience, healthcare, financial markets, spatio-temporal environmental dynamics, etc.
Supplementary Material: pdf
Other Supplementary Material: zip