Polysemy as a Covert Attack Surface: Sense-Level Vulnerabilities in Large Language Models

Polysemy as a Covert Attack Surface: Sense-Level Vulnerabilities in Large Language Models

ACL ARR 2026 January Submission8080 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Sense-Level Vulnerabilities, Large Language Models, Polysemy

Abstract: Semantic sensitivity is a double‑edged sword. It refers to the capacity of large language models (LLMs) to discern fine‑grained meaning, which enables advanced reasoning and carries security risks that remain underexplored. Prior studies on LLM vulnerabilities mainly focus on attacks triggered by explicit lexical or structural patterns, implicitly assuming malicious activation is surface identifiable. We challenge this assumption by revealing polysemy as a new and stealthy threat surface, where specific word senses can serve as covert triggers. Such triggers activate malicious behavior only in their target sense while remaining inert otherwise, which fundamentally differs from prior attacks and evades conventional defenses designed for surface-level cues. To systematically investigate this risk, we introduce Sense-Aware Backdoor attack (SAB), a model editing framework that combines contrastive learning with orthogonal projection-based editing to isolate a discriminative sense subspace and confine malicious behavior within the target sense subspace, achieving strict activation selectivity with limited data. Extensive experiments across four benchmarks show that SAB achieves a high attack success rate under the target sense while maintaining minimal to zero activation on non-target senses. Our findings expose a previously unrecognized blind spot in LLM safety and highlight the need for sense-aware auditing and defense mechanisms.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: adversarial attacks, model editing

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 8080

Loading