Offensive Yet Efficient: Semantic Summarization via Obscene Lexicon

AAAI 2026 Workshop AIGOV Submission38 Authors

21 Oct 2025 (modified: 25 Nov 2025)AAAI 2026 Workshop AIGOV SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Text compression, Semantic summarization, Obscene lexicon, Lexical substitution, Reinforcement learning, Russian language, Expressive text generation
TL;DR: The paper introduces a new summarization method using Russian obscene lexicon for semantic compression, showing that profanity’s density and flexibility allow conveying complex meanings in fewer words than conventional summarization.
Abstract: This paper proposes a novel approach to text summarization utilizing Russian obscene lexicon for extreme semantic compression. Profanity's exceptional semantic density and syntactic flexibility enable encoding complex meaning in minimal textual space. We develop a framework integrating: a curated dictionary of Russian obscene expressions mapped to neutral equivalents, lexicon-guided substitution with morphological analysis, and Group Relative Policy Optimization (GRPO) reinforcement learning optimizing for brevity while maintaining semantic fidelity. Experiments on two benchmarks—a parallel corpus of toxic/neutral Russian sentences and Russian news articles—demonstrate compression while maintaining or improving semantic similarity. Results establish that strategic expressive lexicon deployment, properly constrained within reinforcement learning, provides a viable compression alternative, challenging assumptions about profanity's role in NLP and demonstrating taboo vocabulary as a legitimate computational resource.
Submission Number: 38
Loading