Detecting and classifying toxic comments

Neha Narwal

Published: 01 Jan 2017, Last Modified: 06 Feb 2025https://web.stanford.edu/class/archive/cs/cs224n/reportsEveryoneCC BY 4.0

Abstract: This paper introduces various deep learning approaches applied to the task of classifying toxicity in online comments. We study the impact of Support Vector Machines (SVM), Long Short-Term Memory Networks (LSTM), Convolutional Neural Networks (CNN), and Multilayer Perceptron (MLP) methods, in combination with word and character-level embeddings, on identifying toxicity in text. We evaluated our approaches on Wikipedia comments from the Kaggle Toxic Comments Classification Challenge dataset. Our word-level assessment found that our forward LSTM model achieved the highest performance on both binary classification (toxic vs non-toxic) and multi-label classification (classifying specific kinds of toxicity) tasks. Regarding character-level classification, our best results occurred when using a CNN model. Overall, however, our word-level models significantly outperformed our character-level models. Moving forward, we seek to attain higher performance through applying richer word/character representations and using more complex deep learning models.