Contrastive Prototype Framework for Calibrating Video Recommendation

Fan Li, Jiazhen Huang, Shisong Tang, Bing Han, Huafeng Cao, Haochen Sui, Ting Xu, Xiaoyu Kang

Published: 27 Oct 2025, Last Modified: 06 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Online video recommendation systems often build binary labels based on play complete rate (i.e., the ratio of watch time to video duration), such as complete play and effective play, using them as implicit feedback for Click-Through Rate (CTR) prediction tasks to gauge user interest. Existing works tend to improve prediction accuracy by designing complex models, overlooking that a key cause of inaccurate predictions is the disorganization of instance representation space. To address this issue, we explore a novel approach using prototype learning to calibrate the instance representation space of deep recommendation models and propose a model-agnostic Contrastive Prototype Framework (CPF). Firstly, CPF partitions the instance space into different subspaces based on duration, then generates positive and negative prototype pairs for each subspace from pre-trained recommendation model. Subsequently, we map the instance representations to the prototype space and calibrate them by reducing the distance to the corresponding prototypes. Ultimately, the prediction is derived from the linear combination of the estimated values associated with each prototype. To prevent disorganization in the prototype space during training, we design contrastive and orthogonality losses to constrain the learning of prototypes. Additionally, we show that how CPF effectively addresses the duration bias from the perspective of causal intervention. Offline experiments on two datasets demonstrate that CPF improves recommendation accuracy over several baseline models in predicting five widely used implicit feedback labels. We have also deployed CPF on a short video platform, validating its effectiveness in real-world scenarios.

External IDs:doi:10.1145/3746027.3755869