Removing High Frequency Information Improves DNN Behavioral Alignment

ICLR 2024 Workshop Re-Align Submission48 Authors

Published: 02 Mar 2024, Last Modified: 15 Apr 2024ICLR 2024 Workshop Re-Align PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: short paper (up to 5 pages)
Keywords: error-consistency, shape-bias, alignment, low-frequency
Abstract: Despite their increasingly impressive performance and capabilities, to date there still exists a significant misalignment between Deep Neural Networks (DNNs) and human behavior. A large body of research exists identifying misalignments and exploring where they arise from, with some work attributing it to the fact that humans and DNNs use the frequency spectrum of images differently. In this paper, we show that removing high-frequency information by applying blur and resize transformations to images before being fed to a DNN dramatically improves its alignment with humans according to shape-bias and error-consistency. Specifically, a ViT-H-14 OpenCLIP model tested on blurred images ($\sigma=2.5$) achieves an error-consistency with humans of $\kappa=0.37$, halving the current gap between DNN-human and human-human error-consistency. While these operations do affect a model's accuracy, we present preliminary evidence for an alignment-accuracy tradeoff, and note that moving forward, practitioners may have to choose between having a model with superhuman accuracy and one that behaves like a human.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 48
Loading