Abstract: Click-through rate (CTR) prediction is an essential component of industrial multimedia recommendation, and the key to enhancing the accuracy of CTR prediction lies in the effective modeling of feature interactions using rich user profiles, item attributes, and contextual information. Most of the current deep CTR models resort to parallel or stacked structures to break through the performance bottleneck of Multi-Layer Perceptron (MLP). However, we identify two limitations in these models: (1) parallel or stacked structures often treat explicit and implicit components as isolated entities, leading to a loss of mutual information; (2) traditional CTR models, whether in terms of supervision signals or interaction methods, lack the ability to filter out noise information, thereby limiting the effectiveness of the models.
In response to this gap, this paper introduces a novel model by integrating alternate structure and contrastive learning into only one simple MLP, discarding the design of multiple MLPs responsible for different semantic spaces, named the Simple Contrast-enhanced Network (SimCEN), which employs a contrastive product to build second-order feature interactions that have the same semantic but different representation spaces. Additionally, it employs an external-gated mechanism between linear layers to facilitate explicit learning of feature interactions and to filter out noise. At the final representation layer of the MLP, a contrastive loss is incorporated to help the MLP obtain self-supervised signals for higher-quality representations. Experiments conducted on six real-world datasets demonstrate the effectiveness and compatibility of this simple framework, which can serve as a substitute for MLP to enhance various representative baselines. Our source code and detailed running logs will be made available at https://anonymous.4open.science/r/SimCEN-8E21.
Primary Subject Area: [Engagement] Multimedia Search and Recommendation
Relevance To Conference: This work presents a significant advancement in the field of multimedia/multimodal processing by addressing the challenges of Click-Through Rate (CTR) prediction in the context of multimedia recommendation systems. The proposed Simple Contrast-enhanced Network (SimCEN) introduces a unified framework that effectively captures feature interactions through contrastive learning within a single Multi-Layer Perceptron (MLP) architecture. By implementing a contrastive product that builds second-order interactions across different representation spaces and an external-gated mechanism for noise filtering, SimCEN addresses two main limitations in existing models: (1) parallel or stacked structures often treat explicit and implicit components as isolated entities, leading to a loss of mutual information; (2) traditional CTR models, whether in terms of supervision signals or interaction methods, lack the ability to filter out noise information, thereby limiting the effectiveness of the models.
Submission Number: 3005
Loading