MILAB at PragTag-2023: Enhancing Cross-Domain Generalization through Data Augmentation with Reduced Uncertainty

Yoonsang Lee, Dongryeol Lee, Kyomin Jung

Published: 2023, Last Modified: 23 Aug 2024ArgMining@EMNLP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper describes our submission to the PragTag task, which aims to categorize each sentence from peer reviews into one of the six distinct pragmatic tags. The task consists of three conditions: full, low, and zero, each distinguished by the number of training data and further categorized into five distinct domains. The main challenge of this task is the domain shift, which is exacerbated by non-uniform distribution and the limited availability of data across the six pragmatic tags and their respective domains. To address this issue, we predominantly employ two data augmentation techniques designed to mitigate data imbalance and scarcity: pseudo-labeling and synonym generation. We experimentally demonstrate the effectiveness of our approaches, achieving the first rank under the zero condition and the third in the full and low conditions.