QSTS: A Question-Sensitive Text Similarity Measure for Question Generation.
Abstract: While question generation (QG) has received significant focus in conversation modeling and
text generation research, the problems of comparing questions and evaluation of QG models
have remained inadequately addressed. Indeed, QG models continue to be evaluated using traditional measures such as BLEU, METEOR, and ROUGE scores which were designed for other text generation problems. We propose QSTS, a
novel Question-Sensitive Text Similarity measure for questions that characterizes their target
intent based on question class, named-entity, and semantic similarity information.
We show that QSTS addresses several shortcomings of existing measures that depend on
n-gram overlap scores and obtains superior results compared to traditional measures on
publicly-available QG datasets. We also collect a novel dataset SimQG for enabling question
similarity research in QG contexts. SimQG contains questions generated by state-of-the-art
QG models along with human judgements on their relevance with respect to passage contexts
as well as the given reference questions. Using SimQG, we showcase the key aspect of QSTS
that differentiates it from all existing measures. QSTS is not only able to characterize similarity
between two questions, but is also able to score questions with respect to passage contexts. Thus QSTS is, to our knowledge, the
first metric that enables the measurement of QG performance in a reference-free manner
0 Replies
Loading