Mitigating Sample Selection Bias with Robust Domain Adaption in Multimedia Recommendation

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Industrial multimedia recommendation systems extensively utilize cascade architectures to deliver personalized content for users, generally consisting of multiple stages like retrieval and ranking. However, retrieval models have long suffered from Sample Selection Bias (SSB) due to the distribution discrepancy between the exposed items used for model training and the candidates (almost unexposed) during inference, affecting recommendation performance. Traditional methods utilize retrieval candidates as augmented training data, indiscriminately treating unexposed data as negative samples, which leads to inaccuracies and noise. Some efforts rely on unbiased datasets, while they are costly to collect and insufficient for industrial models. In this paper, we propose a debiasing framework named DAMCAR, which introduces Domain Adaptation to mitigate SSB in Multimedia CAscade Recommendation systems. Firstly, we sample hard-to-distinguish samples from unexposed data to serve as the target domain, optimizing data quality and resource utilization. Secondly, adversarial domain adaptation is employed to generate pseudo-labels for each sample. To enhance robustness, we utilize Exponential Moving Average (EMA) to create a teacher model that supervises the generation of pseudo-labels via self-distillation. Finally, we obtain a retrieval model that maintains stable performance during inference through a hybrid training mechanism. We conduct offline experiments on two real-world datasets and deploy our approach in the retrieval model of a multimedia video recommendation system for online A/B testing. Comprehensive experimental results demonstrate the effectiveness of DAMCAR in practical applications.
Primary Subject Area: [Engagement] Multimedia Search and Recommendation
Secondary Subject Area: [Engagement] Multimedia Search and Recommendation, [Systems] Data Systems Management and Indexing, [Engagement] Summarization, Analytics, and Storytelling
Relevance To Conference: Our work contributes to mitigating Sample Selection Bias (SSB) in multimedia cascade recommendation systems for an improved user experience. These systems are crucial for mobile Internet platforms, aiming to accurately present multi-modal items aligned with users' interests. In industrial scenarios, the online deployment of multimedia recommendation systems necessitates balancing effectiveness and efficiency. Thus, a prevalent approach is to adopt funnel-shaped cascade architectures. Simple models are employed in the early retrieval stage to swiftly filter out irrelevant items from a large candidate pool, while sophisticated models are then utilized in the ranking stage for precise ranking. However, the retrieval models have long suffered from SSB due to the training-inference inconsistency, affecting recommendation performance. In this paper, we introduce a debiasing framework named DAMCAR, as a comprehensive solution to mitigate SSB. We start by sampling a target domain from unexposed data and use adversarial domain adaptation to generate unbiased pseudo-labels. To further enhance robustness and reliability, we employ Exponential Moving Average (EMA) to create a teacher model, supervising the learning of pseudo-labels through a self-distillation mechanism. Experiments conducted on two real-world video recommendation datasets and the online deployment of an industrial multimedia cascade recommendation system prove the practical benefits of DAMCAR.
Supplementary Material: zip
Submission Number: 1247
Loading