Predicting compound branching directions with distributional semantics

Annika Schebesta; Jessica Nieder; Motoki Saito

Predicting compound branching directions with distributional semantics

Annika Schebesta, Jessica Nieder, Motoki Saito

Published: 03 Oct 2025, Last Modified: 13 Nov 2025CPL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: distributional semantics, word embeddings, compounds, branching direction, accuracies

TL;DR: We investigate the degree to which distributional semantics can be used to determine compound branching automatically.

Abstract: The default branching direction in English triconstituent nominal compounds (NNN) is reported to be left- branching [1, 2, 3]: left-branching NNN of the structure [N1N2] N3 ([seatN1 beltN2] lawN3) are more frequent than right-branching NNN of the structure N1 [N2N3] (cornerN1 [drugN2storeN3]). Generally, the branching of NNN is simultaneously provided by several determinants: the order of composition, the lexical bigram frequency of N1N2 and N2N3, the spelling of the NNN, and the meaning that is conveyed with the NNN have variant power of influencing the parsing, and frequently are in competition about branching direction. The meaning of seman- tically transparent compounds like coffee bean roaster can be inferred from its constituents, while the meaning of more opaque compounds like horseradish dip is less predictable. Thus, parsing the branching is a complex cognitive process in the human mind, informed by multiple sources of linguistic and contextual information. In this study, we investigate whether it is possible to leverage distributional semantics to predict branching computationally. We used two sets of data: one data set with 465 NNN from BURSC [4] and one data set with 100 NNN originally constructed for a production experiment [3]. The branching direction of the corpus data was determined in a se- mantic and orthographic analysis by two raters [5]. The experimental data contains 50 semantically ambiguous NNN that are set in two different contexts each which suggest one branching direction over the other, resulting in 50 left-branching and 50 right-branching NNN. The resulting 100 NNN have been rated for their branching direction by 46 native speakers in an online experiment. We retrieved contextual embeddings using the pre-trained uncased BERT base model (bert-base-uncased). For each NNN, we extracted the full context sentence in which the compound appeared (from corpus text files or from the experimental stimuli). For each noun within the compound (N1, N2, N3), we located its occurrence in the unmasked sentence and obtained its corresponding token-level embedding from the last hidden layer of BERT. When a noun was represented by multiple subword tokens, we averaged the respective token vectors. This procedure yields three contextualized embeddings per NNN: one for the free noun and one for each embedded constituent, representing each noun’s distributional semantics in its actual sentence context. These embeddings were then used as input features in a Linear Discriminant Analysis (LDA) to assess whether branching direction could be predicted based on distributional semantic patterns. Table 1 summarizes classifi- cation performance of the LDA model. Overall, the LDA classifier achieved high accuracy on the training data, however, when presented with held-out data we observe a substantial drop of accuracies (Corpus: 0.68%, Ex- periment: 0.23%). Despite the low accuracy on held-out data, left-branching compounds were classified more accurately during training, reflecting both the overall left-branching preference in English NNN compounds and participants’ behavioural preferences. Interestingly, this pattern reversed in the experimental test data: while the in-bag results showed a bias toward left-branching compounds, out-of-bag predictions favored right- branching ones (RB > LB), indicating that the model’s apparent left-branching advantage did not generalize. This highlights the ambiguous nature of the compounds in the experimental data, showing no clear preference by either the participants or the model. These findings highlight both the potential and the limitations of using distributional semantic models like BERT for modelling human parsing decisions in structurally ambiguous compounds. While contextual embeddings capture some cues relevant for branching, reliably predicting human-like interpretation patterns, especi

Submission Number: 34

Loading