Abstract: To alleviate the high cost of manually annotating Question Answering (QA) datasets, Question Generation (QG) has been proposed, which requires the model to generate a question related to the given answer and passage.
This work primarily focuses on Multi-Span Question Generation (MSQG), where the generated question corresponds to multiple candidate answers.
We observe that traditional QG methods may not suit MSQG as they typically overlook the correlation between the candidate answers and generate trivial questions.
To address it, we propose \textbf{REGULAR}, a framework of $\underline{\textbf{RE}}$lation-$\underline{\textbf{GU}}$ided Mu$\underline{\textbf{L}}$ti-Sp$\underline{\textbf{A}}$n Question Gene$\underline{\textbf{R}}$ation.
REGULAR first converts passages into knowledge graphs and extracts candidate answers from the knowledge graphs.
Then, REGULAR utilizes a QG model to generate a set of candidate questions and a QA model to obtain the optimal question.
We construct over 100,000 questions using Wikipedia and PubMed corpora, named REGULAR-WIKI and REGULAR-MED respectively, and conduct experiments to compare our synthetic datasets with other synthetic QA datasets.
The experiment results show that models pre-fine-tuned with our synthetic dataset achieve optimal performance.
We also conduct ablation studies and statistical analysis to verify the quality of our synthetic dataset.
Our code and data are available at https://anonymous.4open.science/r/REGULAR-BC26.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: question generation, reading comprehension
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 2676
Loading