PolyGraph Discrepancy: a classifier-based metric for graph generation

Published: 26 Jan 2026, Last Modified: 03 Mar 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: graph generative models, model evaluation, maximum mean discrepancy, generative models
TL;DR: We propose a new, robust and insightful classifier-based evaluation metric for evaluating graph generative models.
Abstract: Existing methods for evaluating graph generative models primarily rely on Maximum Mean Discrepancy (MMD) metrics based on graph descriptors. While these metrics can rank generative models, they do not provide an absolute measure of performance. Their values are also highly sensitive to extrinsic parameters, namely kernel and descriptor parametrization, making them incomparable across different graph descriptors. We introduce PolyGraphScore (PGS), a new evaluation framework that addresses these limitations. It approximates the Jensen-Shannon (JS) distance of graph distributions by fitting binary classifiers to distinguish between real and generated graphs, featurized by these descriptors. The data log-likelihood of these classifiers approximates a variational lower bound on the JS distance between the two distributions. Resulting scores are constrained to the unit interval $[0,1]$ and are comparable across different graph descriptors. We further derive a theoretically grounded summary score that combines these individual metrics to provide a maximally tight lower bound on the distance for the given descriptors. Thorough experiments demonstrate that PGS provides a more robust and insightful evaluation compared to MMD metrics. A reference implementation of PGD is available at https://github.com/BorgwardtLab/polygraph-benchmark
Supplementary Material: zip
Primary Area: generative models
Submission Number: 7508
Loading