Abstract: Multimodal recommender systems alleviate the sparsity of historical user–item interactions. They are commonly catalogued based on the type of auxiliary data (modality) they leverage, such as preference data plus user-network (social), user/item texts (textual), or item images (visual), respectively. One consequence of this categorization is the tendency for virtual walls to arise between modalities. For instance, a study involving images would compare to only baselines ostensibly designed for images. However, a closer look at existing models’ statistical assumptions about any one modality would reveal that many could work just as well with other modalities. Therefore, we pursue a systematic investigation into several research questions: which modality one should rely on, whether a model designed for one modality may work with another, which model to use for a given modality. We conduct cross-modality and cross-model comparisons and analyses, yielding insightful results pointing to interesting future research directions for multimodal recommender systems.
0 Replies
Loading