Semi-supervised Embedding for Scalable and Accurate Time Series Clustering

Andrew Hill, Russell Bowler, Katerina J. Kechris, Farnoush Banaei Kashani

Published: 01 Jan 2022, Last Modified: 13 Jan 2025IEEE Big Data 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: While time series data are abundant in numerous real world applications, large labeled time series datasets are scarce. Semi-supervised models, which leverage small amounts of labeled data along with a large set of unlabeled data, have been shown to significantly outperform unsupervised learning models that only rely on unlabeled data for time series clustering. However, existing semi-supervised time series clustering algorithms suffer from lack of scalability as they are limited to perform learning operations within the original data space. We propose a scalable and accurate autoencoder-based semi-supervised learning model for time series clustering in the embedded space. With this model, we also introduce multiple semi-supervised objective functions that leverage only a small number of labeled examples but significantly improve the quality of the autoencoder’s learned latent space for clustering. Our experiments on a variety of datasets show that our methods can often improve performance of a typical clustering method (namely, k-means). We demonstrate that our methods achieve a maximum average Adjusted Rand Index (ARI) of 0.897, a 140% increase over an unsupervised Convolutional Autoencoder (CAE) model. Finally, our proposed methods also achieve a maximum improvement of 44% over an existing semi-supervised model.