Deep Goal-Oriented Clustering

TMLR Paper1633 Authors

30 Sept 2023 (modified: 15 Jan 2024)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: Clustering and prediction are two primary tasks in the fields of unsupervised and supervised machine learning. Although much of the recent advances in machine learning have been centered around those two tasks, the interdependent, mutually beneficial relationship between them is rarely explored. In this work, we hypothesize that a better prediction performance for the downstream task would inform a more appropriate clustering strategy. To this end, we introduce Deep Goal-Oriented Clustering (DGC), a probabilistic framework built upon a variational autoencoder with the latent prior being a Gaussian mixture distribution. DGC clusters the data by jointly predicting the side-information and modeling the inherent data structure in an end-to-end fashion. We show the effectiveness of our model on a range of datasets by achieving good prediction accuracies on the side-information, while, more importantly in our setting, simultaneously learning congruent clustering strategies that are on par with the state-of-the-art. We also apply DGC to a real-world breast cancer dataset and show that the discovered clusters carry clinical significance.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Vincent_Fortuin1
Submission Number: 1633
Loading