Balanced Learning for Multi-Domain Long-Tailed Speaker Recognition

Janghoon Cho; Sunghyun Park; Hyunsin Park; Hyoungwoo Park; Seunghan Yang; Sungrack Yun

Balanced Learning for Multi-Domain Long-Tailed Speaker Recognition

Janghoon Cho, Sunghyun Park, Hyunsin Park, Hyoungwoo Park, Seunghan Yang, Sungrack Yun

Published: 01 Jan 2024, Last Modified: 01 Oct 2024ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper considers two types of imbalance problems commonly inherent in large-scale datasets: multiple domain and class imbalance. Class imbalance causes the algorithm to be biased toward the majority classes, and multiple-domain data results in significant performance disparities for different domains. To tackle these challenges, we propose a novel learning approach for multi-domain imbalanced datasets, featuring two techniques: (i) distribution-aware partial mask and (ii) domain-wise interprototype loss function. The distribution-aware partial mask selects negative class centers based on class-level distribution and domain labels, adjusting the ratio of positive and negative updates for prototype vectors and enhancing discriminative feature learning within each domain. Additionally, the domain-wise interprototype loss enforces orthogonality among prototype vectors within each domain, leading to increased discriminativeness. We demonstrate the superiority of our approach over baselines through experiments on publicly available speaker recognition datasets, including CN-Celeb and Mozilla Common Voice.

Loading