Abstract: Despite Generative AI’s rapid growth (e.g., ChatGPT, GPT-4, Dalle-2, etc.), generated
data from these models may have an inherent bias. This bias can propagate to downstream tasks
e.g., classification, and data augmentation that utilize data from generative AI models. This thesis
empirically evaluates model bias in different deep generative models like Variational Autoencoder
and PixelCNN++. Further, we resample generated data using importance sampling to reduce bias
in the generated images based on a recently proposed method for bias-reduction using
probabilistic classifiers. The approach is developed in the context of image generation and we
demonstrate that importance sampling can produce better quality samples with lower bias. Next,
we improve downstream classification by developing a semi-supervised learning pipeline where
we use importance-sampled data as unlabeled examples within a classifier. Specifically, we use a
loss function called as the semantic-loss function that was proposed to add constraints on
unlabeled data to improve the performance of classification using limited labeled examples.
Through the use of importance-sampled images, we essentially add constraints on data instances
that are more informative for the classifier, thus resulting in the classifier learning a better
decision boundary using fewer labeled examples.
Loading