Prototypical network based few shot learning to detect Hindi-English code-mixed offensive text

Shikha Mundra, Namita Mittal, Richi Nayak

Published: 01 Dec 2025, Last Modified: 21 Jan 2026Social Network Analysis and MiningEveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Social media platforms, nowadays, are often misused to disseminate hate speech, offensive and body shaming comments targeting a group or individual. Such content is increasing in mixed languages due to language diversity. A common approach is to use CNN and LSTM in a supervised setting to extract relevant features. However, the practical implementation of these models presents a significant hurdle given the substantial demand for annotated data during the training phase as well as the risk of overfitting when the training data is small. In this paper, we present weakly supervised few-shot learning with episodic training with the sentence embedding MuRIL and Siamese prototypical network, CNN-Siamese-FSL, to identify the feature similarity. We developed a dataset HBN-tweet having classes as Hate, Body shame, and None in Hi-En (Hindi–English) code mixed language. Empirical analysis reveals that the CNN-Siamese-FSL outperformed machine learning classifiers and matching network significantly. The experiment involves testing the nearest neighbour approach without episodic training, resulting in a significant decline in performance. This demonstrates the impact of learning the class features using episodic training with a few labelled data. Overall, it is observed that the knowledge about a few class labels enhanced the discriminatory representation of classes.
Loading