## Towards Coreset Learning in Probabilistic Circuits

13 Jun 2022, 15:37 (modified: 17 Aug 2022, 14:56)TPM 2022Readers: Everyone
Keywords: Probabilistic Circuits, Coresets
Abstract: Probabilistic circuits (PCs) are a powerful family of tractable probabilistic models, guaranteeing efficient and exact computation of many probabilistic inference queries. However, their sparsely structured nature makes computations on large data sets challenging to perform. Recent works have focused on tensorized representations of PCs to speed up computations on large data sets. In this work, we present an orthogonal approach by sparsifying the set of $n$ observations and show that finding a coreset of $k \ll n$ data points can be phrased as a monotone submodular optimisation problem which can be solved greedily for a deterministic PCs of $|\mathcal{G}|$ nodes in $\mathcal{O}(k \, n \, |\mathcal{G}|)$. Finally, we verify on a series of data sets that our greedy algorithm outperforms random selection.