LLMs Struggle to Rank Products Robustly
Keywords: Product recommendations, LLM benchmarking, Trustworthy AI
Abstract: Hundreds of millions of people are using LLM agents to compare products (e.g, querying ``what is the best magnesium supplement?''), and those agents retrieve documents from the web to generate their answers. Many of these answers rely on third-party comparison articles, which use editorial framing techniques designed to influence human product decisions. This paper investigates the question: do these same influence techniques determine which product an LLM agent recommends? We introduce FramingBench, a benchmark that measures how 19 influence techniques, drawn from communication and advertising research, can shift LLM product rankings across 10 consumer domains and 7 LLMs. All LLMs we test, including frontier models such as Claude Sonnet and GPT-5.4, suffer from framing susceptibility: their product rankings are not invariant to transformations of the input document that preserve the underlying product specifications. One of these techniques places a chosen product at rank 1 in 76% of cases. We further show that standard defenses designed for prompt injection and adversarial attacks are largely ineffective against framing susceptibility. Instead, we propose document sanitization and user-stated evaluation criteria as more effective alternatives.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 195
Loading