Keywords: dependence fidelity, generative models, multivariate dependence, trustworthy AI, inferential stability
TL;DR: Marginal fidelity alone does not ensure trustworthy generative AI; mismatched dependence can destabilize inference, while covariance-level control ensures stability.
Abstract: Recent advances in generative artificial intelligence have led to increasingly real-
istic synthetic data, yet the criteria used to evaluate such models remain largely
focused on marginal distribution matching and likelihood-based measures. While
these diagnostics assess local realism, they provide limited insight into whether
a generative model preserves the multivariate dependence structures that govern
downstream inference. In this work, we argue that this gap represents a funda-
mental limitation in current approaches to trustworthy generative modeling. We
introduce covariance-level dependence fidelity as a practical and interpretable cri-
terion for evaluating whether a generative distribution preserves the joint structure
of data beyond univariate marginals. We formalize this notion through covariance-
based measures and establish three core results. First, we show that distributions
can match all univariate marginals exactly while exhibiting substantially different
dependence structures, demonstrating that marginal fidelity alone is insufficient.
Second, we prove that dependence divergence induces quantitative instability in
downstream inference, including sign reversals in population regression coeffi-
cients despite identical marginal behavior. Third, we provide positive stability
guarantees, showing that explicit control of covariance-level dependence diver-
gence ensures stable behavior for dependence-sensitive tasks such as principal
component analysis. Using minimal synthetic constructions, we illustrate how
failures in dependence preservation lead to incorrect conclusions in extreme-event
estimation and regression despite identical marginal distributions. Together, these
results highlight dependence fidelity as a useful diagnostic for evaluating gen-
erative models in dependence-sensitive downstream tasks, with implications for
diffusion models and variational autoencoders, and potential extensions to large
language models as future work.
Submission Number: 90
Loading