Unsupervised image categorization based on deep generative models with disentangled representations and von Mises-Fisher distributions

Wentao Fan, Kunxiong Xu

Published: 01 Jan 2025, Last Modified: 01 Mar 2025Int. J. Mach. Learn. Cybern. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Variational autoencoders (VAEs) have emerged as powerful deep generative models for learning abstract representations in the latent space, making them highly applicable across diverse domains. This paper presents a novel image categorization approach that leverages VAEs with disentangled representations. In VAE-based clustering models, the latent representations learned by encoders often combine both generation and clustering information. To address this concern, our proposed model disentangles the acquired latent representations into dedicated clustering and generation modules, thereby enhancing the performance and efficiency of clustering tasks. Specifically, we introduce an extension of the Kullback–Leibler (KL) divergence to promote independence between these two modules. Additionally, we incorporate the von Mises-Fisher (vMF) distribution to improve the clustering model’s ability to capture cluster characteristics within the generation module. Extensive experimental evaluations confirm the effectiveness of our model in clustering tasks, notably without the requirement for pre-training. Furthermore, when compared to various deep generative clustering models requiring pre-training, our model is able to achieve either comparable or superior performance across multiple datasets.