The Independent Compositional Subspace Hypothesis for the Structure of CLIP's Last Layer

Max Wolff; Wieland Brendel; Stuart Wolff

The Independent Compositional Subspace Hypothesis for the Structure of CLIP's Last Layer

Max Wolff, Wieland Brendel, Stuart Wolff

Published: 04 Mar 2023, Last Modified: 16 May 2023ME-FoMo 2023 PosterReaders: Everyone

Abstract: In this paper, we propose a hypothesis which posits that CLIP disentangles compositional visual attributes into orthogonal, independent subspaces which CLIP uses to build compositional representations of images. Our hypothesis suggests that CLIP learns compositional techniques that are similar to humans'. We find five core compositional attributes predicted by the hypothesis: color, size, counting, camera view, and pattern. We empirically test their properties and find that they code for their respective compositional attribute type and are essentially orthogonal to one another, as well as the subject of the image.

0 Replies

Loading