Universality and kernel-adaptive training for classically trained, quantum-deployed generative models

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Instantaneous Quantum Polynomial circuit (IQP), quantum circuit Born machine(QCBM), maximum mean discrepancy, quantum generative models, universality, adaptive kernel
TL;DR: We prove that an IQP-QCBM generative model becomes universal by adding ancillary qubits, and propose a kernel-adaptive MMD training method that might improves stability and performance over fixed kernels.
Abstract: The instantaneous quantum polynomial (IQP) quantum circuit Born machine (QCBM) has been proposed as a promising quantum generative model over bitstrings. Recent works have shown that the training of IQP-QCBM is classically tractable w.r.t. the so-called Gaussian kernel maximum mean discrepancy (MMD) loss function, while maintaining the potential of a quantum advantage for sampling itself. Nonetheless, the model has a number of aspects where improvements would be important for more general utility: (1) the basic model is known to be not universal - i.e. capable of representing arbitrary distributions, and it was not known whether it is possible to achieve universality by adding hidden (ancillary) qubits; (2) A fixed Gaussian kernel used in the MMD loss can cause training issues, e.g., vanishing gradients, as we demonstrate in this paper. For the former, we prove that for an $n$-qubit IQP generator, adding $n + 1$ hidden qubits makes the model universal. For the latter, we propose a kernel-adaptive training method, where the kernel is adversarially trained. We formally prove that such adaptive kernels have strictly greater discriminative power, and also show that in the kernel-adaptive method, the convergence of the MMD value implies convergence in distribution of the generator. We also analyze the limitations of the MMD-based training method. Finally, we verify the performance benefits of our contributions on a synthetic, parity-check dataset. The results show that the kernel-adaptive training method outperforms the Gaussian kernel w.r.t. the total variation distance between the generator and the data, and the advantage of the adaptive method becomes larger as the qubit number increases. These modifications and analyses shed light on the limits and potential of these new quantum generative methods, which could offer the first truly scalable insights in the comparative capacities of classical versus quantum models, even without access to scalable quantum computers.
Primary Area: generative models
Submission Number: 7308
Loading