Abstract: Techniques that use speech analysis for tasks like health monitoring and emotion recognition usually operate on moderately sized windows with little regard for what the individual is saying. In this work, we argue that isolating specific phonemes within speech offers greater nuance that leads to more consistent yet natural sounds for analysis. We examine this hypothesis in the context of lung function estimation. We recruited 11 patients with chronic obstructive pulmonary disease (COPD) to read from a script and perform spirometry to quantify their lung function. After segmenting their audio recordings into discrete phonemes, we extracted various phonation, prosodic, and spectral features to summarize their acoustic qualities. We then examined the correlation between those audio features and measurements from spirometry, observing that certain combinations of features and phonemes led to higher correlations than the best-performing phoneme-agnostic baseline for our dataset.
External IDs:dblp:conf/icassp/BhallaHGWLM25
Loading