Image is All You Need: Towards Efficient and Effective Large Language Model-Based Recommender Systems

Image is All You Need: Towards Efficient and Effective Large Language Model-Based Recommender Systems

ICLR 2026 Conference Submission16795 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Recommender Systems, Large Language Models, Sequence Modeling

TL;DR: By using images of items interacted with by users for LLMs, we aim to enhance the efficiency and effectiveness of LLM-based recommender systems.

Abstract: Large Language Models (LLMs) have recently emerged as a powerful backbone for recommender systems. Existing LLM-based recommender systems take two different approaches for representing items in natural language, i.e., Attribute-based Representation and Description-based Representation. In this work, we aim to address the trade-off between efficiency and effectiveness that these two approaches encounter, when representing items consumed by users. Based on our observation that there is a significant information overlap between images and descriptions associated with items, we propose a novel method, **I**mage is all you need for **LLM**-based **Rec**ommender system (I-LLMRec). Our main idea is to leverage images as an alternative to lengthy textual descriptions for representing items, aiming at reducing token usage while preserving the rich semantic information of item descriptions. Through extensive experiments on real-world Amazon datasets, we demonstrate that I-LLMRec outperforms existing methods that leverage textual descriptions for representing items in both efficiency and effectiveness by leveraging images. Moreover, a further appeal of I-LLMRec is its ability to reduce sensitivity to noise in descriptions, leading to more robust recommendations.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 16795

Loading