Keywords: embedding clustering, tabular data, Gaussian clusters, autoencoder, representation learning
TL;DR: This paper proposes an unsupervised learning method to improve embedding clustering of tabular data
Abstract: The latent representation in an autoencoder achieves dimensionality reduction via self-supervised data reconstruction learning. The quality of latent representations has been improved for images by jointly learning a t-distributed embedding with clustering inspired by the neighborhood embedding concept proposed for data visualization. In this paper, we discuss the objectives of clustering and data visualization to present a novel Gaussian Cluster Embedding in Autoencoder Latent Space (G-CEALS) by replacing t-distributions with Gaussian clusters. Unlike current methods, the proposed method defines the Gaussian embedding and the target cluster distribution independently to accommodate any clustering algorithm in representation learning. The proposed G-CEALS method outperforms six baseline clustering and cluster embedding methods on five out of seven tabular data sets and is on par with a cluster embedding method on the sixth data set. In general, G-CEALS outperforms all six methods for clustering tabular data when the data dimensionality is greater than ten. Realizing the superior performance of traditional machine learning with tabular data over deep learning, this paper shows one of the first joint representation learning and clustering methods to improve the clustering of tabular data.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Unsupervised and Self-supervised learning
14 Replies
Loading