UniRec: Unified Multimodal Encoding for LLM-Based Recommendations

UniRec: Unified Multimodal Encoding for LLM-Based Recommendations

ICLR 2026 Conference Submission20118 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language models, recommender systems, multimodality, heterogeneous data

TL;DR: We introduce UniRec, a unified multimodal encoder for LLM-based recommendation that handles text, images, categorical, and numerical signals.

Abstract: Large language models (LLMs) have recently shown promise for multimodal rec- ommendation, particularly with text and image inputs. Yet real-world recom- mendation signals extends far beyond these modalities. To reflect this, we for- malize recommendation features into four modalities: text, images, categorical features, and numerical attributes, and emphasize unique challenges this hetero- geneity poses for LLMs in understanding multimodal information. In particular, these challenges arise not only across modalities but also within them, as attributes (e.g., price, rating, time) may all be numeric yet carry distinct meanings. Beyond this intra-modality ambiguity, another major challenge is the nested structure of recommendation signals, where user histories are sequences of items, each car- rying multiple attributes. To address these challenges, we propose UniRec, a unified multimodal encoder for LLM-based recommendation. UniRec first em- ploys modality-specific encoders to produce consistent embeddings across het- erogeneous signals. It then applies a triplet representation—comprising attribute name, type, and value—to separate schema from raw inputs and preserve semantic distinctions. Finally, a hierarchical Q-Former models the nested structure of user interactions while maintaining their layered organization. On multiple real-world benchmarks, UniRec outperforms state-of-the-art multimodal and LLM-based recommenders by up to 15%, while extensive ablation studies further validate the contributions of each component.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 20118

Loading