REGULAR: A Framework for Relation-Guided Multi-Span Question Generation

REGULAR: A Framework for Relation-Guided Multi-Span Question Generation

ACL ARR 2025 February Submission2676 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: To alleviate the high cost of manually annotating Question Answering (QA) datasets, Question Generation (QG) has been proposed, which requires the model to generate a question related to the given answer and passage. This work primarily focuses on Multi-Span Question Generation (MSQG), where the generated question corresponds to multiple candidate answers. We observe that traditional QG methods may not suit MSQG as they typically overlook the correlation between the candidate answers and generate trivial questions. To address it, we propose \textbf{REGULAR}, a framework of $\underline{\textbf{RE}}$lation-$\underline{\textbf{GU}}$ided Mu$\underline{\textbf{L}}$ti-Sp$\underline{\textbf{A}}$n Question Gene$\underline{\textbf{R}}$ation. REGULAR first converts passages into knowledge graphs and extracts candidate answers from the knowledge graphs. Then, REGULAR utilizes a QG model to generate a set of candidate questions and a QA model to obtain the optimal question. We construct over 100,000 questions using Wikipedia and PubMed corpora, named REGULAR-WIKI and REGULAR-MED respectively, and conduct experiments to compare our synthetic datasets with other synthetic QA datasets. The experiment results show that models pre-fine-tuned with our synthetic dataset achieve optimal performance. We also conduct ablation studies and statistical analysis to verify the quality of our synthetic dataset. Our code and data are available at https://anonymous.4open.science/r/REGULAR-BC26.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: question generation, reading comprehension

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 2676

Loading