Abstract: We introduce an online, time-dependent clustering algorithm that employs a dynamic probabilistic topic model. The proposed algorithm can handle data that evolves over time and strives to capture the evolution of clusters in the dataset. It addresses the case where the entire dataset is not available at once (e.g., the case of data streams) but an up-to-date clustering of the data at any given time is required. One of the main challenges of the data stream setting is that the computational cost and memory overhead must stay bounded as the number of data points increases. Our proposed algorithm has a Dirichlet process-based generative component combined with a sequential Monte Carlo sampler for posterior inference. We also introduce a novel modification to the sampling process, called targeted sampling, which enhances the performance of the SMC sampler. We test the performance of our algorithm with both synthetic and real datasets.
0 Replies
Loading