Post-Training with Interrogative Sentences for Enhancing BART-based Korean Question Generator

Gyu-Min Park, Seong-Eun Hong, Seong-Bae Park

Published: 2022, Last Modified: 04 Aug 2024AACL/IJCNLP (2) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The pre-trained language models such as KoBART often fail in generating perfect interrogative sentences when they are applied to Korean question generation. This is mainly due to the fact that the language models are much experienced with declarative sentences, but not with interrogative sentences. Therefore, this paper proposes a novel post-training of KoBART to enhance it for Korean question generation. The enhancement of KoBART is accomplished in three ways: (i) introduction of question infilling objective to KoBART to enforce it to focus more on the structure of interrogative sentences, (ii) augmentation of training data for question generation with another data set to cope with the lack of training instances for post-training, (iii) introduction of Korean spacing objective to make KoBART understand the linguistic features of Korean. Since there is no standard data set for Korean question generation, this paper also proposes KorQuAD-QG, a new data set for this task, to verify the performance of the proposed post-training. Our code are publicly available at https://github.com/gminipark/post_training_qg