ERIS: issues in using HurtLex and Best-Worst Scaling annotation to develop a lexical resource for Modern Greek offensive language detection.
Abstract: Offensive language is the use of expressions in natural language that aim to, or result in, insult, offend, or attack the recipient of the message. A number of other terms refer to related but sometimes different phenomena of undesirable languages, such as toxic or abusive language, while several other phenomena are linked to these, but more are more specific, such as racism, misogyny, homophobia, and other phenomena that find manifestations in language expressions. Hate speech is a particular form of undesirable language phenomena that involves attacks towards minorities and protected groups, and is often, even if partially, regulated by law and policies. Despite these differences, in this text the term “offensive language” (hereinafter: OL) is used for both offensive and hate language since a line between them is difficult to be drawn and, at the same time, terms in the two domains are used interchangeably (Davidson et al., 2017; Waseem et al., 2017).
Hate speech is related to behaviors forbidden by the law (at least in some countries) such as violence or discrimination directed against a group of persons or a member of such a group, public defamation on the grounds of race, nation, ethnicity, religion or other beliefs/convictions, sex or gender and sexual orientation. It is also related to behaviors that could be considered prohibited such as sending a message which can cause annoyance, harassment and / or needless anxiety to another person, which the sender knows to be false, for any reason. It is important for our societies to be able to discover hate speech early and somehow stop its dissemination and even, provide effective counterspeech.
This work presents (i) a lexical resource that could contribute to hate speech detection in Modern Greek and (ii) a methodology for resource development. In what follows, the term “offensive language” (hereinafter: OL) is used for both offensive and hate language despite the differences between them. This choice has been made because a line between these two types of language/speech is difficult to be drawn and, at the same time, terms in the two domains are used interchangeably (Davidson et al., 2017; Waseem et al., 2017).
0 Replies
Loading