QSTS: A Question-Sensitive Text Similarity Measure for Question Generation.

07 Oct 2022OpenReview Archive Direct UploadReaders: Everyone
Abstract: While question generation (QG) has received significant focus in conversation modeling and text generation research, the problems of comparing questions and evaluation of QG models have remained inadequately addressed. Indeed, QG models continue to be evaluated using traditional measures such as BLEU, METEOR, and ROUGE scores which were designed for other text generation problems. We propose QSTS, a novel Question-Sensitive Text Similarity measure for questions that characterizes their target intent based on question class, named-entity, and semantic similarity information. We show that QSTS addresses several shortcomings of existing measures that depend on n-gram overlap scores and obtains superior results compared to traditional measures on publicly-available QG datasets. We also collect a novel dataset SimQG for enabling question similarity research in QG contexts. SimQG contains questions generated by state-of-the-art QG models along with human judgements on their relevance with respect to passage contexts as well as the given reference questions. Using SimQG, we showcase the key aspect of QSTS that differentiates it from all existing measures. QSTS is not only able to characterize similarity between two questions, but is also able to score questions with respect to passage contexts. Thus QSTS is, to our knowledge, the first metric that enables the measurement of QG performance in a reference-free manner
0 Replies

Loading