Keywords: Pragmatics, Facework, Safety Alignment, LLM Refusal, Human-AI Interaction
Abstract: Refusals are often treated as face-threatening acts in pragmatics because they can challenge the requester’s socially claimed self-image. Large language models (LLMs) are increasingly trained to refuse unsafe and inappropriate requests, and these refusals may harm users when models fail to manage this interactional cost properly. While existing work has mainly approached LLM non-compliance as a safety-alignment outcome, it does not provide a way to evaluate whether LLMs refuse appropriately across different harmful contexts. To study this question, we propose, to our knowledge, the first taxonomy grounded in pragmatic theories of refusal for analyzing LLM non-compliance. Applying this taxonomy to responses from 16 modern LLMs across 14 harm categories, we find that although models differ in how they refuse, their refusals are overall explicit, ethics-based, and strongly morally evaluative, with interactional repair occurring mainly through offering or providing safer alternatives instead of interpersonal facework. This pattern is especially consequential in sensitive harm contexts, where overuse of negative framing may make users feel shamed or provoked, undermining the purpose of safe non-compliance. We therefore call for alignment evaluation that considers not only whether models refuse harmful requests, but also whether they refuse in ways that are contextually adaptive and socially accountable for the interactional consequences of saying no.
Paper Type: Long
Research Area: Human-Centered NLP and Human-AI Interaction
Research Area Keywords: human-centered evaluation, human factors in NLP, human-AI interaction/cooperation
Contribution Types: Model analysis & interpretability, Data resources, Data analysis, Theory
Languages Studied: English
EMNLP 2026 AI Reviewing Experiment: no
Submission Number: 16201
Loading