MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion

ICLR 2026 Conference Submission21265 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Persuasion, LVLM Evaluation, Fairness, Safety
Abstract: Large vision–language models (LVLMs) increasingly mediate decisions in shopping, health, and news consumption, where persuasive content is pervasive. An LVLM that is easily persuaded can produce preference-incongruent, unethical, or unsafe outputs. However, their susceptibility remains largely unexplored across diverse topics, strategies, preferences, and modalities. In this paper, we present a unified framework, MMPersuade, for studying multimodal persuasion in LVLMs. It includes a comprehensive multimodal dataset that pairs images and videos with established persuasion principles, covering commercial, subjective and behavioral, and adversarial contexts, and an evaluation framework to measure persuasion effectiveness through third-party agreement scoring and self-estimated token probability. Our study of six leading LVLMs yields three key insights: (i) multimodal inputs are generally more persuasive than text alone, especially in convincing models to accept misinformation; (ii) stated prior preferences decrease susceptibility, yet multimodal information maintains its advantage; and (iii) different strategies vary in effectiveness depending on context, with reciprocity potent in commercial and subjective contexts, and credibility and logic prevailing in adversarial contexts. Our data and framework can support the development of LVLMs that are robust, ethically aligned, and capable of responsibly engaging with persuasive content.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 21265
Loading