LLMs Struggle to Rank Products Robustly

Published: 14 Jun 2026, Last Modified: 14 Jun 2026ICML 2026 Workshop MusIML PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Product recommendations, LLM benchmarking, Trustworthy AI
Abstract: People are using LLM agents to compare products (e.g., querying ``what is the best magnesium supplement?''), and those agents retrieve documents from the web to generate their answers. These answers rely on third-party comparison articles, which use editorial framing techniques designed to influence human product decisions. This paper asks the question: do these same influence techniques determine which product an LLM agent recommends? We introduce FramingBench, which measures how 19 influence techniques, drawn from communication and advertising research, shift LLM product rankings across 10 consumer domains and 7 LLMs. All LLMs we test, including frontier models such as GPT-5.4, suffer from framing susceptibility: their product rankings are not invariant to transformations of the input document that preserve the underlying product specifications. The strongest technique places a chosen product at rank 1 in 76\% of cases, demonstrating that the human persuasion playbook transfers reliably to LLM rankers.
Track: Track 2: ML Research by Muslim Authors
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Non Archival Confirmation: I understand that submissions to MusIML are non-archival and can be submitted to other venues.
Submission Number: 27
Loading