Keywords: Diffusion Models, Image Classification, Augmentation
TL;DR: We use controlled diffusion models to generate new high quality input output pairs without additional supervision
Abstract: While synthetic data generated through diffusion models has been shown to improve performance across various tasks, existing approaches face two challenges: the necessity of fine-tuning a diffusion model for a specific dataset is often expensive, and the domain gap between real and synthetic data limits synthetic data's usefulness, especially in fine-grained classification settings. To mitigate these shortcomings, we developed CDaug, a novel approach to data augmentation utilizing controlled diffusion. Instead of utilizing diffusion models to generate wholly new images, we take a self-supervised approach and condition the generated images on existing data, allowing us to create high quality synthetic images/augmentations that capture the semantic priors and underlying structure of the data while infusing meaningful and novel variations with no human intervention. We developed a pipeline that utilizes ControlNet, conditioned on the original data, and captions generated by the multi-modal LLM LLaVA2 to guide the generative process. Our work uses open-source models, does not require fine-tuning, and is modular. We demonstrate improved performance across 7 fine-grained datasets, in both few-shot and full dataset settings, across many architectures.
Submission Number: 51
Loading