Keywords: diffusion, testing, genAI, minimax
Abstract: Diffusion models have demonstrated powerful generative capabilities, but their potential in statistical hypothesis testing remains underexplored. The score-based paradigm of diffusion formulates the task as the problem of detecting positive Fisher divergence between the noised null distribution and the noised, unknown data distribution. Diffusion models were initially proposed for generation since noising simplifies sampling, but they pose a conceptual puzzle in the context of hypothesis testing: the null and alternative hypotheses become harder to distinguish as the noise level increases. Therefore, aside from testing in Fisher divergence, diffusion models may face serious limitations in addressing fundamental hypothesis testing problems, such as testing in total variation distance. In this paper, we set out to rigorously characterize the statistical limits of diffusion's score-based approach to testing. We derive the minimax rate of testing in Fisher divergence against a broad alternative hypothesis consisting of densities which are compactly supported and assumed only to be bounded below by a constant. Notably, we capture the sharp scaling with respect to the the noise level. We then turn to testing in total variation, and since it is folklore that the problem is trivial without any regularity conditions, we study Holder-smooth alternatives. As established in the literature, the Fisher divergence can be aggregated over noise levels to bound the total variation distance; hence, separation in total variation implies separation in aggregated Fisher divergence. After sharpening our Fisher divergence testing results to incorporate the available smoothness, we show that an aggregation of test statistics furnishes a test which achieves the sharp minimax testing rate in total variation. Hence, diffusion models are optimal for hypothesis testing.
Primary Area: learning theory
Submission Number: 14494
Loading