Keywords: distributional semantics, word embeddings, compounds, branching direction, accuracies
TL;DR: We investigate the degree to which distributional semantics can be used to determine compound branching automatically.
Abstract: The default branching direction in English triconstituent nominal compounds (NNN) is reported to be left-
branching [1, 2, 3]: left-branching NNN of the structure [N1N2] N3 ([seatN1 beltN2] lawN3) are more frequent than
right-branching NNN of the structure N1 [N2N3] (cornerN1 [drugN2storeN3]). Generally, the branching of NNN
is simultaneously provided by several determinants: the order of composition, the lexical bigram frequency of
N1N2 and N2N3, the spelling of the NNN, and the meaning that is conveyed with the NNN have variant power
of influencing the parsing, and frequently are in competition about branching direction. The meaning of seman-
tically transparent compounds like coffee bean roaster can be inferred from its constituents, while the meaning
of more opaque compounds like horseradish dip is less predictable. Thus, parsing the branching is a complex
cognitive process in the human mind, informed by multiple sources of linguistic and contextual information.
In this study, we investigate whether it is possible to leverage distributional semantics to predict branching
computationally.
We used two sets of data: one data set with 465 NNN from BURSC [4] and one data set with 100 NNN originally
constructed for a production experiment [3]. The branching direction of the corpus data was determined in a se-
mantic and orthographic analysis by two raters [5]. The experimental data contains 50 semantically ambiguous
NNN that are set in two different contexts each which suggest one branching direction over the other, resulting
in 50 left-branching and 50 right-branching NNN. The resulting 100 NNN have been rated for their branching
direction by 46 native speakers in an online experiment.
We retrieved contextual embeddings using the pre-trained uncased BERT base model (bert-base-uncased).
For each NNN, we extracted the full context sentence in which the compound appeared (from corpus text files
or from the experimental stimuli). For each noun within the compound (N1, N2, N3), we located its occurrence
in the unmasked sentence and obtained its corresponding token-level embedding from the last hidden layer of
BERT. When a noun was represented by multiple subword tokens, we averaged the respective token vectors.
This procedure yields three contextualized embeddings per NNN: one for the free noun and one for each
embedded constituent, representing each noun’s distributional semantics in its actual sentence context.
These embeddings were then used as input features in a Linear Discriminant Analysis (LDA) to assess whether
branching direction could be predicted based on distributional semantic patterns. Table 1 summarizes classifi-
cation performance of the LDA model. Overall, the LDA classifier achieved high accuracy on the training data,
however, when presented with held-out data we observe a substantial drop of accuracies (Corpus: 0.68%, Ex-
periment: 0.23%). Despite the low accuracy on held-out data, left-branching compounds were classified more
accurately during training, reflecting both the overall left-branching preference in English NNN compounds
and participants’ behavioural preferences. Interestingly, this pattern reversed in the experimental test data:
while the in-bag results showed a bias toward left-branching compounds, out-of-bag predictions favored right-
branching ones (RB > LB), indicating that the model’s apparent left-branching advantage did not generalize.
This highlights the ambiguous nature of the compounds in the experimental data, showing no clear preference
by either the participants or the model.
These findings highlight both the potential and the limitations of using distributional semantic models like BERT
for modelling human parsing decisions in structurally ambiguous compounds. While contextual embeddings
capture some cues relevant for branching, reliably predicting human-like interpretation patterns, especi
Submission Number: 34
Loading