Discrete Markov Probabilistic Models: An Improved Discrete Score-Based Framework with sharp convergence bounds under minimal assumptions

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: An algorithm based on the score function for generating discrete data, together with their convergence bounds
Abstract: This paper introduces the Discrete Markov Probabilistic Model (DMPM), a novel algorithm for discrete data generation. The algorithm operates in discrete space, where the noising process is a continuous-time Markov chain that can be sampled exactly via a Poissonian clock that flips labels uniformly at random. The time-reversal process, like the forward noise process, is a jump process, with its intensity governed by a discrete analogue of the classical score function. Crucially, this intensity is proven to be the conditional expectation of a function of the forward process, strengthening its theoretical alignment with score-based generative models while ensuring robustness and efficiency. We further establish convergence bounds for the algorithm under minimal assumptions and demonstrate its effectiveness through experiments on low-dimensional Bernoulli-distributed datasets and high-dimensional binary MNIST data. The results highlight its strong performance in generating discrete structures. This work bridges theoretical foundations and practical applications, advancing the development of effective and theoretically grounded discrete generative modeling.
Lay Summary: This paper presents a new algorithm for generating discrete data, like binary patterns or pixel images, using a mathematically grounded approach. The method works by gradually adding and removing noise in a controlled way, allowing it to learn how to produce realistic-looking data. Unlike many existing models, it is specifically designed for data made up of bits. We prove that the method is reliable and efficient, and show in experiments that it works well for both simple and complex datasets. Overall, the paper combines strong theory with practical results to advance how we generate structured, discrete data.
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: generative models, score-based generative models, discrete data, categorical data, convergence bound
Submission Number: 7197
Loading