Lexicon Enhanced N-Gram and Bidirectional Encoder Representation From Transformer Model to Detect Hate Speech in Social Media

Twinkle Joshi

Published: 05 Dec 2025, Last Modified: 15 Apr 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Social media platforms have gained worldwide popularity and used for multiple activities such as advertisements and news sharing. On the other hand, social media is also used to spread rumors and misuse hateful words. Detecting hate speech on social media have one big challenge is the use of a code-mixed language, because this language mixes with two or three languages while speaking or writing and sometimes causes confusion. This research proposed a Lexicon Enhanced N-gram and Bidirectional Encoder Representation from a Transformer (LE-NBERT) to detect hate speech in social media. A Lexicon Based Bidirectional Encoder representation from a transformer (LEBERT) integrates features from the hate lexicon directly into embedding layers, and is used to effectively identify important details within unstructured texts from noisy and ambiguous data. Lexicon N-grams (LEN) are used to enhance natural language understanding and Valence Aware Dictionary and sEntiment Reasoner (VADER) used for detection and identifies text with extremely negative sentiments, indicating hateful content. The experimental results of the proposed method achieved a high accuracy of 99.87 % for the Bengali dataset, 98.90% for the Multi-Off dataset, 98.96% for the Italian dataset, and 99.90 % for the Spanish dataset, which is more accurate than the existing methods like BiLSTM, BiGRU.