Abstract: Email is one of the most common methods of official and personal communication to exchange information. For the administration department, dealing with hundreds of emails with the same type of inquiries or requests results in a huge operational overhead. In this study, we explore email classification models. An email classification system should understand the topics in the email content for categorizing emails and indicate if an incoming email should be handled by the mailbox owner. Email categorization based on topics is a multi-label classification task. Most existing email categorization models perform binary classification to identify spam, phishing, or malware attacks. We propose a CNN-BiLSTM model for multiclass email classification. Our experiments show that compared to the two other models that we implemented namely CNN (76.19%) and BiLSTM (61.9%) models, the CNN-BiLSTM (83.33%) and Hierarchical CNN-BiLSTM models (85.33%) have much better performance.
Loading