A Non-Parametric Test to Detect Data-Copying in Generative Models
Abstract: Detecting overfitting in generative models is
an important challenge in machine learning.
In this work, we formalize a form of overfitting
that we call data-copying – where the generative model memorizes and outputs training
samples or small variations thereof. We provide a three sample non-parametric test for
detecting data-copying that uses the training
set, a separate sample from the target distribution, and a generated sample from the
model, and study the performance of our test
on several canonical models and datasets.
0 Replies
Loading