Keywords: compositionality, linearity, VLM, MAP
TL;DR: We model compositional pliability in vector composition, by defining ideal words as distribution.
Abstract: Vision-Language Models (VLMs) organize concepts into shared embedding spaces, enabling compositional reasoning across modalities. Prior works demonstrated that composite concepts can be constructed by combining “ideal words” derived from attribute–object pairs. However, they rely solely on mean representations, neglecting the uncertainty inherent in these embeddings. In this work, we introduce Probabilistic Decomposable Embeddings (PDE), a framework that explicitly models ideal words as distribution. Instead of simply averaging attribute and object vectors, PDE formulates composition as a maximum a posteriori (MAP) estimation problem, producing composite embeddings biased toward concepts with lower variance. This probabilistic treatment yields partner-aware, precision-weighted composites with a simple count-based scale recovery. We first visualize PDE, showing that it reorients composite directions toward higher-precision axes while decoupling direction from scale. On compositional classification, PDE often matches or surpasses linear decomposable embeddings and geodesically decomposable embeddings in both modalities—improving harmonic mean and AUC. These results highlight \emph{compositional pliability} as a useful inductive bias for uncertainty-aware composition in VLM embeddings.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 11189
Loading