DeepMIN: Deep Multi-modal Interest Network with Cognitive Learning Modules

Published: 2024, Last Modified: 15 Jan 2026DASFAA (3) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Click-through rate (CTR) prediction are playing an increasingly significant role in the field of recommendation. One of the most commonly used methods is to leverage deep neural networks to learn users’ interests and make corresponding recommendations. To provide better personalized predictions for users, multi-modal recommendation system, which dedicates to learn more about multifarious modality features of targets, is currently receiving widespread attention. However, existing multi-modal recommendation systems often overlook the interactions between modalities and lack sufficient theoretical support to justify their approaches. In this paper, we propose Deep Multi-modal Interest Network with Cognitive Learning Modules (DeepMIN), inspired by knowledge of interests from cognitive psychology. In addition, we employ multi-head attention mechanism to extract richer feature information from different modalities and user behavior sequence. By mapping the id-based features and modality features of items to the concepts and perception in cognitive psychology, we utilize four delicately designed modules to simulate the process of how humans perceive external information and generate interest. We evaluate our model on both public dataset and industrial dataset and the experimental results show that DeepMIN demonstrates significant advantages in learning from multi-modal information.
Loading