The Binding Problem in Vision Models: Geometric, Functional, and Behavioral Approaches

Lianghuan Huang; Yihao Li; Yingshan Chang; Saeed Salehi; Konrad Kording

The Binding Problem in Vision Models: Geometric, Functional, and Behavioral Approaches

Lianghuan Huang, Yihao Li, Yingshan Chang, Saeed Salehi, Konrad Kording

Published: 23 Sept 2025, Last Modified: 29 Oct 2025NeurReps 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: binding, compositionality, linear representation

TL;DR: This paper introduces geometric, functional, and behavioral tools to quantify the extent to which vision models bind features into coherent objects.

Abstract: Existing studies of neural networks have focused largely on $\textit{compositionality}$—whether individual features can be linearly decoded and reused—while overlooking the equally important issue of $\textit{binding}$, i.e., how features are linked together to form coherent objects. This leaves a gap in understanding whether models truly represent feature conjunctions rather than mere unstructured feature bags. We propose a geometric and functional framework for quantifying binding, introducing a binding score based on principal angles between concept subspaces and validating it with linear or non-linear probes. To complement this, we design a behavioral diagnostic dataset in which pairs of images share identical feature bags but differ in how those features are bound into objects. Together, these frameworks highlight binding as a distinct and measurable dimension of representation, providing tools to diagnose where current vision models succeed—and where they fail—in capturing object structure.

Submission Number: 90

Loading