Keywords: cooperative AI, civic discourse, LLM evaluation, pressure robustness, manipulation risk, refusal behavior
TL;DR: We introduce DiffCoop-Civic, a pilot evaluation showing that subtle civic pressure reveals cooperative propensity failures that overt manipulation prompts often hide through refusal or model dependent compliance.
Abstract: Cooperative capabilities in language models are dual use. The same social reasoning that supports civic deliberation can also enable strategic omission, false consensus, and manipulative framing. We argue that Cooperative AI evaluations should separate what models can do under benign instructions from what they tend to do under realistic civic pressure. We introduce DiffCoop-Civic, a 10 scenario pilot evaluation suite spanning preference understanding, evidence and persuasion, commitment design, asymmetric information, and dissent preservation. Across seven models from four model families, subtle omission pressure produces a near uniform shift where manipulative enablement rises by 1.17 points and dissent preservation falls by 1.67 points on a 5 pointer scale. Overt false consensus pressure behaves differently: it triggers refusal or redirection in some aligned API models, but direct compliance in several open-weight models. A lightweight Pareto-Trace prompting intervention improves pressure robustness without simply relying on hard refusal. An anonymous reproducibility package is available at https://anonymous.4open.science/r/diffcoop-civil-771C.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 524
Loading