Generating High Fidelity Synthetic Data via Coreset selection and Entropic Regularization

Omead Pooladzandi; Pasha Khosravi; Erik Nijkamp; Baharan Mirzasoleiman

Generating High Fidelity Synthetic Data via Coreset selection and Entropic Regularization

Omead Pooladzandi, Pasha Khosravi, Erik Nijkamp, Baharan Mirzasoleiman

03 Oct 2022 (modified: 05 May 2023)Neurips 2022 SyntheticData4MLReaders: Everyone

Keywords: Generative Modeling, Coreset, Data Augmentation, Semi-Supervised

TL;DR: We use a Latent Energy Based Model along with Coreset Methods to generate and select high fidelity samples in a semi-supervised way.

Abstract: Generative models have the ability to synthesize data points drawn from the data distribution, however, not all generated samples are high quality. In this paper, we propose using a combination of coresets selection methods and ``entropic regularization'' to select the highest fidelity samples. We leverage an Energy-Based Model which resembles a variational auto-encoder with an inference and generator model for which the latent prior is complexified by an energy-based model. In a semi-supervised learning scenario, we show that augmenting the labeled data-set, by adding our selected subset of samples, leads to better accuracy improvement rather than using all the synthetic samples.

5 Replies

Loading