Abstract: We introduce the task of implicit offensive language detection in dialogues, where a statement may have either an offensive or unoffensive interpretation, depending on the listener and context. We argue that inference is crucial for understanding this broader set of offensive utterances, and create a dataset featuring chains of reasoning to describe how an offensive interpretation may be reached. Experiments show that state-of-the-art methods of offense classification perform poorly on this task, achieving less than 0.12 average accuracy. We explore the use of pre-trained entailment models % to score links as part of a multi-hop approach to the problem, showing improved accuracy in most situations. We discuss the feasibility of our approach and the types of external knowledge necessary to support it.
Data: zip
0 Replies