Towards Unveiling the Potential of Fuzzy Values as Features: A Comparative Study in Cybercrime Text Analysis

Faizad Ullah, Muhammad Sohaib Ayub, Ali Faheem, Mian Muhammad Awais, Asim Karim

Published: 01 Jan 2026, Last Modified: 06 Jan 2026CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: Accurate detection and classification of cybercrime text present significant challenges for machine learning models, primarily due to the data’s complex boundaries and overlapping characteristics. In this context, the role of data features becomes critical, as they provide crucial insights and prejudiced strength necessary to devastate the inherent complexities and enhance the model’s accuracy. This paper proposes a novel approach incorporating fuzzy values as features with standard feature extraction techniques to overcome issues arising from unclear boundaries in cybercrime and hate speech texts. By assigning fuzzy values to individual tweets, we capture the degree of relatedness to different cybercrime classes, providing valuable insights into their associations. Additionally, we explore the potential of feature fusion by combining fuzzy values with Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) representations. This fusion results in a more discriminative and informative feature set that captures semantic relevance and contextual significance. Through extensive experimental evaluations, we demonstrate the potential of our proposed approach compared to standard feature extraction techniques, highlighting its effectiveness in handling the complexities of cybercrime boundaries. We present the evaluation of the RUHSOLD and state-of-the-art Cybercrimes in Roman Urdu (CRU) dataset, contribute to advancing cybercrime detection methodologies, and encourage further investigations in multi-class classification challenges within cybersecurity.
Loading