Keywords: prominence, emphasis, exemplar theory, speech perception
Abstract: The goal of this study is to investigate the validity of exemplar theoretic computations as a cognitive mechanism for identifying phrasal prominence in English. Exemplar theory, a prominent framework in cognitive psychology, posits that each experience of a given category is stored within the mind of the perceiver, and that when new exemplars of that category are encountered, they are compared to all previous instances of each possible category, with categorization determined by the overall degree of similarity between the new exemplar and all those stored to date. In the realm of linguistics, previous research suggests that exemplar theory can account for a variety of perceptual phenomena at both the segmental and lexical levels of speech (Goldinger et al., 1996; Johnson, 1997). More recently, several studies have begun to test whether this theory can account for perceptual patterns in the prosodic domain as well (Chow, 2017), however, the majority of these efforts have primarily focused on word-level classifications (Calhoun & Schweizter, 2012; Schweitzer, 2019). Prosody, however, doesn’t always align with lexical or syllabic units; rather, it often operates at the utterance level, with entire pitch or intensity contours serving as exemplars. Thus, the present study aimed to answer the following research questions:
1. Can an exemplar-inspired computational model adequately classify the location of phrasal prominence in statements, yes/no questions, and echo questions using only prosodic characteristics of the whole utterance?
2. How do the different prosodic cues of F0 and intensity contribute to the classification patterns of an exemplar theoretic model in statements, yes/no questions, and echo questions?
To answer these research questions, ten phonetically trained speakers recorded utterances with prominence on the initial, medial, or final word in statements (e.g., *Liam fed a lamb.*), yes/no questions (e.g., *Did Liam feed a lamb?*), and echo questions (e.g., *Liam fed a lamb?*), which were then used to train and evaluate several exemplar theory-inspired computational models, following Johnson (1997). That is, models stored whole-utterance contours of intensity and/or F0 from the training exemplars in memory for comparison during testing, along with their prominence position label (initial, medial, or final) (Bartels & Kingston, 1994; Kochanski et al., 2005; Breen et al., 2010). Models differed in whether they had access to the intensity contour, the F0 contour, or both for determining a given utterance’s location of prominence.That is, models stored equidistantly sampled points of whole-utterance contours of intensity and/or F0 from the training exemplars in memory for comparison during testing, along with their prominence position label (initial, medial, or final). Models differed in whether they had access to the intensity contour, the F0 contour, or both for determining a given utterance’s location of prominence. To make this determination, each test utterance was compared to all training exemplars by calculating the negative exponent of the Euclidean distance between contours following the original model put forward by Johnson (1997) with similarity scores summed across all training exemplars of a given category, and subsequently classified as belonging to the category with which it had the highest overall similarity.
Results showed that the exemplar models achieved 94.07\%, 93.52\%, and 90.19\% accuracy on categorizing prominence location in declaratives, yes/no questions, and echo questions, respectively. A statistical analysis of accuracies across all computational simulations revealed that declaratives were best classified when only intensity was included as a cue to prominence, while both the yes/no questions and echo questions were best classified when F0 was included as a cue. These results suggest that exemplar models can accurately identify phrase-level prominence using only prosodic features at the utterance-level; they also make predictions for which acoustic cues are most relevant to the perception of phrasal prominence by human listeners. Thus, the present study suggests that exemplar storage and similarity computations may be a powerful framework for explaining the perception of phrasal prominence in English.
Submission Number: 26
Loading