Understanding Compositionality in Data Embeddings

TMLR Paper2331 Authors

04 Mar 2024 (modified: 11 Mar 2024)Under review for TMLREveryoneRevisionsBibTeX
Abstract: Embeddings are used in AI to represent symbolic structures such as knowledge graphs. However, the representations obtained cannot be directly interpreted by humans, and may further contain unintended information. We investigate how data embeddings might incorporate such information, despite that information not being used during the training process. We introduce two methods: (1) Correlation-based Compositionality Detection, which measures correlation between known attributes and embeddings, and (2) Additive Compositionality Detection, a process of decomposing embeddings into an additive composition of individual vectors representing attributes. We apply our methods across two domains: word or sentence embeddings and knowledge graph embeddings. We show that word embeddings can be interpreted as composed of semantic and morphological information, and that sentence embeddings can be interpreted as the sum of individual word embeddings. In the domain of knowledge graph embeddings, our methods show that attributes of graph nodes can be inferred, even when these attributes are not used in training the embeddings. Our methods are an improvement over previous approaches for decomposing embeddings in that our methods are 1) more general: they can be applied to multiple embedding types; 2) provide quantitative information about the decomposition; and 3) provide a statistically robust metric for determining the decomposition of an embedding.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Massimiliano_Mancini1
Submission Number: 2331
Loading