Spectral and Geometric Spaces Representation Regularization for Multi-Modal Sequential Recommendation

Zihao Li, Xuekong Xu, Zuoli Tang, Lixin Zou, Qian Wang, Chenliang Li

Published: 01 Jan 2024, Last Modified: 25 Jul 2025CIKM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recent works demonstrate the effectiveness of multi-modal information for sequential recommendation. However, the computational cost and representation degeneration fail to be focused specifically and addressed adequately in multi-modality recommendation. To this end, we first identify and formalize three properties i.e., diversity, compactness, and consistency from the geometric space and spectrum perspective. Building upon this foundation, we devise tailored loss functions to regularize the above three properties for representation optimization. Theoretical underpinnings and experimental results demonstrate the efficacy of an enhanced item representation in ameliorating degeneration. Furthermore, we propose an efficient and expandable image-centered method, named E2 ImgRec, to mitigate the immense cost of computation. Concretely, we substitute the linear projection operations in the self-attention module and feed-forward network layer with two learnable rescaling vectors or efficient recommendation, then leverage cross-attention for multi-modality information fusion. Extensive experiments on three public datasets illustrate our method outperforms representative ID-based solutions and multi-modal based state-of-the-arts with only up to 39.9% in memory usage and 4.3× acceleration in training time. The code for replication is available at https://github.com/WHUIR/E2ImgRec.