Scaling Probabilistic Circuits via Monarch Matrices

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Probabilistic Circuits (PCs) are tractable representations of probability distributions allowing for exact and efficient computation of likelihoods and marginals. Recent advancements have improved the scalability of PCs either by leveraging their sparse properties or through the use of tensorized operations for better hardware utilization. However, no existing method fully exploits both aspects simultaneously. In this paper, we propose a novel sparse and structured parameterization for the sum blocks in PCs. By replacing dense matrices with sparse Monarch matrices, we significantly reduce the memory and computation costs, enabling unprecedented scaling of PCs. From a theory perspective, our construction arises naturally from circuit multiplication; from a practical perspective, compared to previous efforts on scaling up tractable probabilistic models, our approach not only achieves state-of-the-art generative modeling performance on challenging benchmarks like Text8, LM1B and ImageNet, but also demonstrates superior scaling behavior, achieving the same performance with substantially less compute as measured by the number of floating-point operations (FLOPs) during training.
Lay Summary: Deep generative models like transformers and diffusion models have demonstrated huge success in generating texts and images. However it is extremely hard to reliably control their behaviors/outputs. This is due to their limited tractability: that is, we can easily draw high-quality samples from these models, but they cannot reliably solve tasks such as "given 50 pieces of text segments, generate a story using all of them following a specific order" or "given 10 pieces of fragments of a picture, construct a complete picture using all of them". Probabilistic Circuits is one special family of deep generative models that can actually solve such challenging problems. However it is quite challenging to scale up the training of Probabilistic Circuits such that they match the performance of e.g. GPT3, and one fundamental bottleneck is that they are built on top of large dense matrices. In this work, we overcome this challenge by replacing these dense matrices with sparse, structured matrices, which are much more efficient while being amenable to efficient execution on GPUs. Our construction of such sparse structured matrices is a natural result of the multiplication of the probability distributions represented by two (or more) Probabilistic Circuits.
Primary Area: Probabilistic Methods->Graphical Models
Keywords: Probabilistic Circuits, Density Estimation
Submission Number: 5557
Loading