N-CORE: N-View Consistency Regularization for Disentangled Representation Learning in Nonverbal Vocalizations

N-CORE: N-View Consistency Regularization for Disentangled Representation Learning in Nonverbal Vocalizations

ACL ARR 2025 May Submission7553 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Nonverbal vocalizations are an essential component of human communication, conveying rich information without linguistic content. However, the computational analysis of nonverbal vocalization faces significant challenges due to a lack of lexical anchors in the data, compounded by biased distributions of imbalanced multi-label data. While disentangled representation learning has shown promise in isolating specific speech features, its application to nonverbal speech remains unexplored. In this paper, we introduce N-CORE, a novel supervised framework designed to disentangle representations in nonverbal vocalizations by leveraging N views of the audio sample to learn invariance to specific perturbed features. We find that N-CORE achieves competitive performance compared to the baseline methods when tested for emotion and speaker classification tasks on the VIVAE, ReCANVo, and ReCANVo-Balanced datasets. We further propose an emotion perturbation function for audio signals that preserves speaker information, and validate speech transformation functions on nonverbal vocalizations. Our work informs research directions on the application of paralinguistic speech, including privacy-preserving encoding, clinical diagnoses of atypical speech, and longitudinal analysis of communicative development.

Paper Type: Long

Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding

Research Area Keywords: automatic speech recognition, speech technologies, spoken language understanding

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data analysis

Languages Studied: Paralinguistic Speech

Submission Number: 7553

Loading