Understanding Compositionality in Data Embeddings

TMLR Paper2331 Authors

04 Mar 2024 (modified: 31 May 2024)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: Embeddings are used in AI to represent symbolic structures such as knowledge graphs. However, the representations obtained cannot be directly interpreted by humans, and may further contain unintended information. We investigate how data embeddings might incorporate such information, despite that information not being used during the training process. We introduce two methods: (1) Correlation-based Compositionality Detection, which measures correlation between known attributes and embeddings, and (2) Additive Compositionality Detection, a process of decomposing embeddings into an additive composition of individual vectors representing attributes. We apply our methods across three domains: word embeddings using word2vec, which is based on a shallow, two-layer neural network model; sentence embeddings using SBERT, which uses a transformer architecture; and knowledge graph embeddings. We show that word embeddings can be interpreted as composed of semantic and morphological information, and that sentence embeddings can be interpreted as the sum of individual word embeddings. In the domain of knowledge graph embeddings, our methods show that attributes of graph nodes can be inferred, even when these attributes are not used in training the embeddings. Our methods are an improvement over previous approaches for decomposing embeddings in that our methods are 1) more general: they can be applied to multiple embedding types; 2) provide quantitative information about the decomposition; and 3) provide a statistically robust metric for determining the decomposition of an embedding.Embeddings are used in AI to represent symbolic structures such as knowledge graphs. However, the representations obtained cannot be directly interpreted by humans, and may further contain unintended information. We investigate how data embeddings might incorporate such information, despite that information not being used during the training process. We introduce two methods: (1) Correlation-based Compositionality Detection, which measures correlation between known attributes and embeddings, and (2) Additive Compositionality Detection, a process of decomposing embeddings into an additive composition of individual vectors representing attributes. We apply our methods across three domains: word embeddings using word2vec, which is based on a shallow, two-layer neural network model; sentence embeddings using SBERT, which uses a transformer architecture; and knowledge graph embeddings. We show that word embeddings can be interpreted as composed of semantic and morphological information, and that sentence embeddings can be interpreted as the sum of individual word embeddings. In the domain of knowledge graph embeddings, our methods show that attributes of graph nodes can be inferred, even when these attributes are not used in training the embeddings. Our methods are an improvement over previous approaches for decomposing embeddings in that our methods are 1) more general: they can be applied to multiple embedding types; 2) provide quantitative information about the decomposition; and 3) provide a statistically robust metric for determining the decomposition of an embedding.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We would like to once again thank the reviewers for their helpful comments and engagement during the discussion period. Changes: - We update the abstract to be less general, taking the investigated model into account (ELjJ) - We add a motivation for models, evaluation task and methods in sec. 1 (ELjJ) - We engage more thoroughly with prior work (sec.1) (VAT4) - We carry out experiments to see if the output embeddings of words exhibit similar levels of compositionality (WP56) - We carry out experiments on different training stages of BERT (MultiBERTs) (ELjJ and VAT4). - We further carry out experiments to check the compositionality captured in different layers of SBERT. This can serve as other baselines (ELjJ, VAT4 and WP56). - We carry out experiments on different training stages of knowledge graph embedding (VAT4). Specific changes by reviewer: VAT4 > Toning down claims substantially. - We have toned down claims and clarified why we believe that the claims we make about additive compositionality in the sentence embedding area do hold (section 4.2.2). > Engagement with prior work. - We have revised our related work and included it in sec. 1 > Adding some more embedding types - In the word embedding experiments, we cannot use contextual word embeddings, because they are tokenized into subwords, and combined additively to create word embeddings. For sentence embeddings, we have extended results to look at BERT models through training (using the MultiBerts), and additive compositionality of embeddings through layers (pages 17-20) > Writing revisions for concision and clarity. - In particular related work has been consolidated and placed in the Introduction. > Demonstrating more complete additive reconstruction ... - For the sentence embeddings and most of the graph embedding experiments, we do argue that the additive reconstruction is good. Reviewer ELjJ > Provide a motivation for the choice of models [etc] - We have added a paragraph on motivations to sec1. > Potentially consider the use “modern” embeddings/language models - In the word embedding experiments, we cannot use contextual word embeddings, as they are tokenized into subwords, and combined additively to create word embeddings. For the sentence embeddings, we have extended results to look at BERT models through training (using the MultiBerts), and additive compositionality of embeddings through layers (pages 17-20) > Better distinguish between debiasing and compositionality. - Removed discussion of debiasing > Usage of more advanced/complex benchmarking methods [etc] - We have added further baselines for the sentence embedding results, added results on performance at different layers and through training (sec. 4.2.3). > Provide more details on SBERT [etc] - We have clarified this and added additional baselines (sec. 2 and 4.2.3). > Page 3: can existing approaches be applied to other types of embeddings? - We have added our previous response to the document, page 3. > This could be a phrasing/clarification issue: It sounds like that for knowledge graphs, the embeddings are calculated from the user information matrix rather than that embeddings are decomposed into the user matrix. - Please see previous response. > Abstract makes general claims about composability. - We have edited the abstract to be more specific > Introduction: Potentially include a Figure - We have not yet completed this > Update figure 2. It shows graphs for each of the embeddings - The idea is to convey that sentences and words can also be thought of as graphs, where the edges are semantic similarity. > Beginning of Section 2.2 is vague - Edited and moved to introduction > 2.2.3 speaks a lot about bias and not knowledge graphs - Edited > 4.1: Unclear if static embeddings can encode syntactic information - Discussed further in sec. 1 > 4.1.4 unclear what shuffling is doing > 4.3.3 what are the 14 groups and why are they chosen - Clarified (sec.4.3.3) > 4.3.4 what are the 241 groups - Clarified (sec.4.3.4) WP56 > [Remove discussion of kernel approach] - We have revised the paper to place our contributions in context more thoroughly. > The authors should revise sections relevant to the sentence embedding and private attributes to clear the above confusion. Revised > Upon reviewing relevant literature, I found an IDAS paper with a similar title and abstract. Please justify this manuscript. - Please see original response > Adding experiment results for word senses ... - Thank you for this suggestion. We would like to carry out this experiment in future. > Adding more sentence variety to the sentence embedding experiments or justifying why subject-verb-object was sufficient to conclude the compositionality. - Thank you for your comments. We have added additional baselines and have discussed this in the Discussion section. >Revising Section 2.1.3 for its relevancy to the rest of the paper. - Please see previous response
Assigned Action Editor: ~Massimiliano_Mancini1
Submission Number: 2331
Loading