Gradient-based training of Gaussian Mixture Models for High-Dimensional Streaming DataDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: Gaussian Mixture Models, Stochastic Gradient Descent, Unsupervised Representation Learning, Continual Learning
Abstract: We present an approach for efficiently training Gaussian Mixture Models by SGD on non-stationary, high-dimensional streaming data. Our training scheme does not require data-driven parameter initialization (e.g., k-means) and has the ability to process high-dimensional samples without numerical problems. Furthermore, the approach allows mini-batch sizes as low as 1, typical for streaming-data settings, and it is possible to react and adapt to changes in data statistics (concept drift/shift) without catastrophic forgetting. Major problems in such streaming-data settings are undesirable local optima during early training phases and numerical instabilities due to high data dimensionalities.%, and catastrophic forgetting when encountering concept drift. We introduce an adaptive annealing procedure to address the first problem,%, which additionally plays a decisive role in controlling the \acp{GMM}' reaction to concept drift. whereas numerical instabilities are eliminated by using an exponential-free approximation to the standard \ac{GMM} log-likelihood. Experiments on a variety of visual and non-visual benchmarks show that our SGD approach can be trained completely without, for instance, k-means based centroid initialization, and compares favorably to sEM, an online variant of EM.
One-sentence Summary: We present a method to train Gaussian Mixture Models by SGD, which requires no prior k-means initialization as EM does, and is thus feasible for streaming data.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=Me_LVQeIh
11 Replies

Loading