Benchmarking Vision Models Under Generative Continuous Nuisance Shifts

27 May 2024 (modified: 13 Nov 2024)Submitted to NeurIPS 2024 Track Datasets and BenchmarksEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative models, benchmarking, computer vision
TL;DR: Benchmarking vision models using LoRA adapters for realizing continuous nuisance shifts
Abstract: One important challenge in evaluating the robustness of vision models is controlling individual nuisance factors independently. While some simple synthetic corruptions are commonly applied to existing models, they do not fully capture all realistic and relevant distribution shifts of real-world images. To overcome this challenge, we apply LoRA adapters to diffusion models to realize a wide range of individual nuisance shifts in a continuous manner. While existing generative benchmarks perform manipulations in one step, we argue for gradual and continuous nuisance shifts, as they allow evaluating the sensitivity and failure points of vision models. With this in mind, we perform a comprehensive large-scale study to evaluate the robustness and generalization of various classifiers under various nuisance shifts. Through carefully-designed comparisons and analysis, we reveal multiple valuable observations: 1) More modern and larger architectures trained on larger datasets tend to be more robust to various nuisance shifts and fail later for larger scales. 2) Pre-training strategy influences the robustness and fine-tuning a CLIP classifier improves the standard accuracy but deteriorates the robustness. 3) The accuracy drops only account for one dimension of robustness and the failure point analysis should be considered as an additional dimension for robustness evaluation. We hope our continuous nuisance shift benchmark can provide a new perspective on assessing the robustness of vision models.
Supplementary Material: pdf
Submission Number: 1001
Loading