Clustering-Based Automatic Codeword Lengths Determination in Self-Supervised Learning

Takanori Takebayashi, Naoki Masuyama, Yusuke Nojima

Published: 2024, Last Modified: 30 Jul 2025ICMLC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Unsupervised Continual Learning (UCL) can extract information from data without labels. In task-incremental learning, the classification accuracy of most existing UCL methods for the first few tasks is lower than the others because of the lack of diverse information. One of the most useful UCL methods to solve the above problem is Codebook for Unsupervised Continual Learning (CUCL). CUCL quantizes feature vectors to obtain diverse feature representations and resolve the performance problem in the first few tasks. In CUCL, the latent space is divided into subspaces (i.e., codebook), and their representative points (i.e., codewords) quantize feature vectors to maintain the feature information to preserve essential feature information. However, the num-ber of codewords (i.e., codeword lengths) per subspace is the same among all codebooks, and the distribution of feature vectors in subspaces is not considered. In this paper, we examine the effect of estimating the number of codewords by a clustering method on the model performance. Experimental results show the usefulness of the clustering methods for determining codeword lengths.

External IDs:dblp:conf/icmlc/TakebayashiMN24