Abstract: Detection of toxicity in online commentary is a growing branch of Natural Language Processing (NLP). Most research in the area rely only on text-based toxic comment detection. We propose a machine learning approach for detecting the toxicity of a comment by analyzing both the text and the emojis within the comment. Our approach utilizes word embeddings derived from GloVe and emoji2vec to train a bidirectional Long Short Term Memory (biLSTM) model. We also create a new labeled dataset with comments with text and emojis. The accuracy score of our model on preliminary data is 0.911.
Loading