Abstract: Sarcasm detection has established itself as one of the more difficult Natural Language Processing tasks, due to the complex nature of sarcasm. This paper aims to benchmark the performance of state-of-the-art models like BERT, RoBERTa, ALBERT and GPT-3 when faced with this task. The dataset selected is MUStARD, which has increased in popularity in recent years, especially for multimodal tasks, and is one of the most qualitative and data rich dataset. An untuned GPT-3 based model was selected as the baseline and all the other models were fine-tuned using the textual data present in MUStARD, mainly the context and utterance information. The best performer was found to be the GPT-3 fine-tuned model, with an F1 score of 77. This is in line with the reported feats of GPT-3 based models that have popularized in recent months and reaffirms the superiority of GPT-3. Future avenues of research are then presented and explored, and the conclusions are drawn.
Loading