Detecting Deceptive Tweets in Arabic for Cyber-Security

Francisco M. Rangel Pardo, Paolo Rosso, Anis Charfi, Wajdi Zaghouani

2019 (modified: 03 Nov 2022)ISI 2019Readers: Everyone

Abstract: In the framework of the QNRF project on Arabic Author Profiling for Cyber-Security, we addressed deception detection in Arabic in order to discard those messages that do not really represent potential threats. We have applied the Low Dimensionality Statistical Embedding (LDSE) method to several corpora for Arabic including the Arabic credibility corpus and two new corpora that we created: the Qatar Twitter corpus and the Qatar News corpus. We achieved a performance of 0.797 Macro F-measure on the Arabic Credibility corpus. The obtained results with two well-known distributed representations, namely Continuous Bag of Words and Skip Grams, showed the competitiveness of our approach. The LDSE approach gave similar results on the two corpora that we created. We evaluated our work in a cross-genre scenario, showing the robustness of LDSE when there are enough data about similar topics.

0 Replies