Keywords: Neural network learning mechanisms, Datasets for studying models’ properties, Adaptive model evaluation
Abstract: When a human undertakes a test, their responses likely follow a pattern: if they answered an easy
question (2×3) incorrectly, they would likely answer a more difficult one (2×3×4) incorrectly; and if
they answered a difficult question correctly, they would likely answer the easy one correctly. Anything
else hints at memorization. Do current visual recognition models exhibit a similarly structured
learning capacity? In this work, we consider the task of image classification and study if those
models’ responses follow that pattern. Since real images aren’t labeled with difficulty, we first create
a dataset of 100 categories, 10 attributes, and 3 difficulty levels using recent generative models: for
each category (e.g., dog) and attribute (e.g., occlusion), we generate images of increasing difficulty
(e.g., a dog without occlusion, a dog only partly visible). We find that most of the models do in fact
behave similarly to the aforementioned pattern around 80-90% of the time. Using this property, we
then explore a new way to evaluate those models. Instead of testing the model on every possible test
image, we create an adaptive test akin to GRE, in which the model’s performance on the current
round of images determines the test images in the next round. This allows the model to skip over
questions too easy/hard for itself, and helps us get its overall performance in fewer steps.
Primary Area: interpretability and explainable AI
Submission Number: 653
Loading