Preliminary Research on Automatic Data Augmentation for Irony Detection

Zishen Chang, Michal Ptaszynski, Fumito Masui

Published: 01 Jan 2024, Last Modified: 23 Jun 2025RIVF 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper presents a method to expand the Japanese irony sentences dataset. The main idea is to enhance existing classification models by collecting new data, using existing models, and extracting new data candidates based on the classification results. Manual annotation of the data candidates is carried out to retain only high-quality data, which is then used as new data for model retraining. However, relying on a single classification model does not guarantee sufficient reliability of the results. Therefore, this study utilizes an ensemble learning approach, employing multiple classification models. This approach facilitates the extraction of high-quality data candidates, which are then manually annotated to construct an expanded dataset. Finally, the existing classification models are retrained using the constructed expanded dataset to improve the model's generalization performance. This recursive data augmentation method expands a Japanese irony sentence dataset, enhancing the model's ability to handle these complex linguistic expressions.