Abstract: Rendered images have been used to debug models, study inductive biases, and understand transfer learning. To scale up rendered datasets, we construct a pipeline with 40 classes of images including furniture and consumer products, backed by 48,716 distinct object models, 480 environments, and 563 materials. We can easily vary dataset diversity along four axes—object diversity, environment, material, and camera angle, making the dataset "programmable". Using this ability, we systematically study how these axes of data characteristics influence pretrained representations. We generate 21 datasets by reducing diversity along different axes, and study performance on five downstream tasks. We find that reducing environment has the biggest impact on performance and is harder to recover after fine-tuning. We corroborate this by visualizing the models’ representations, findings that models trained on diverse environments learn more visually meaningful features.
0 Replies
Loading