Abstract: Voice type discrimination (VTD) is the task of automatically detecting speech produced in the same room as a recording device ("live speech") among other speech and non-speech noises, such as traffic noises or radio broadcasts ("distractor audio"). Existing work has described methods for performing the VTD task. This paper presents a method for adapting the output of these existing methods in an unsupervised manner via x-vector clustering and correlation. This adaptation method can be applied to the output of any VTD algorithm, requires no additional training data, and has been shown to yield a relative decrease in decision cost function (DCF) score of up to 47% on a standardized database collected for the task.
External IDs:dblp:conf/icassp/LindseyVS23
Loading