Human and Deep Neural Network Alignment in Navigational Affordance Perception

Clemens Georg Bartnik; Iris Groen

Human and Deep Neural Network Alignment in Navigational Affordance Perception

Clemens Georg Bartnik, Iris Groen

Published: 02 Mar 2024, Last Modified: 05 May 2024ICLR 2024 Workshop Re-Align PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 9 pages)

Keywords: scene perception, navigatinal affordances, path drawings, DNNs

TL;DR: Our study presents path annotations as a challenging benchmark to investigate human-DNN alignment in perceiving navigational affordances, revealing that DNNs represent objects well, but fail to capture navigational affordances.

Abstract: Moving through the world requires extracting navigational affordances from the immediate visual environment. How do humans compute this information from visual inputs? Over the last decade, Deep Neural Networks (DNNs) trained on visual recognition tasks have been shown to predict human perception remarkably well in the domain of object recognition, but their alignment with humans in other visual task contexts, such as spatial navigation, remain less understood. Here, we investigated the alignment of DNNs with human-perceived navigational affordances in a broad variety of visual environments by using explainable AI and different model training objectives. We curated a diverse set of naturalistic real-world indoor, outdoor man-made, and outdoor natural scenes. For each scene, we gathered human annotations identifying the objects present and collected drawings of path trajectories that participants would take through each scene. Quantitative analysis of the path annotations highlights that participants perceive and choose similar paths in each environment and thus diagnostic features for navigational affordances are present in the images. Using representational similarity analysis, we discovered that DNN features exhibit low correlations with information relevant to navigational affordance, such as mean pathways and floor segmentation. They demonstrate slightly better correlations with estimated depth information. However, these correlations are substantially lower than with the representational space of the contained objects. These findings illustrate that DNNs represent object information rather than representations of navigational affordances. This highlights that our path annotations are a rich and challenging benchmark to study human-DNN alignment and that current commonly used DNNs are yet not capturing navigational affordance representations well.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 61

Loading