Keywords: synthetic dataset, pre-training, vision transformer
TL;DR: We develop a procedure to automatically improve the quality of synthetic image datasets by exploiting the entropy of local features.
Abstract: As construction of image datasets for pre-training large vision models grows at a rapid pace, it is becoming increasingly difficult to correct the societal biases in these datasets and mitigate the risks of violating privacy and copyright. Synthetic image datasets that are free of such risks and biases could be substituted for part of the pre-training. Unlike real image datasets that can only increase in quantity and resolution, the quality of images in synthetic datasets can be improved continuously. However, previous efforts to improve synthetic datasets have required many trials and human intervention. In this study, in order to reduce the amount of pre-experimentation, we construct a dataset that achieves high classification accuracy by using indicators that do not require training. We suggest an indicator that can be used to evaluate the quality of synthetic datasets, which is based on the distribution of local features. We show that the entropy of local features in pre-training images is correlated with the accuracy in downstream tasks. Based on this result, we constructed HighEnt-1k, a synthetic image dataset that was generated automatically by maximizing the entropy of local features. We applied HighEnt-1k to the pre-training of the DeiT-Tiny model and achieved a classification accuracy of 89.0% in average on 7 target dataset fine-tuning. This result is comparable to that of the state-of-the-art VisualAtom model. Furthermore, only a single automated generation trial without any human intervention was needed to achieve this result.
Submission Number: 8
Loading