Unsupervised Voice Type Discrimination Score Adaptation Using X-Vector Clusters

Published: 2023, Last Modified: 08 Jan 2026ICASSP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Voice type discrimination (VTD) is the task of automatically detecting speech produced in the same room as a recording device ("live speech") among other speech and non-speech noises, such as traffic noises or radio broadcasts ("distractor audio"). Existing work has described methods for performing the VTD task. This paper presents a method for adapting the output of these existing methods in an unsupervised manner via x-vector clustering and correlation. This adaptation method can be applied to the output of any VTD algorithm, requires no additional training data, and has been shown to yield a relative decrease in decision cost function (DCF) score of up to 47% on a standardized database collected for the task.
Loading