Keywords: entailment, text-classifier
TL;DR: Using dialogue reconstruction tasks to measure the conversational coherence of language models
Abstract: Although many language models have high accuracy on language comprehension tasks, their true coherence levels remain low — the models often rely on spurious correlations in their input text to achieve their level of accuracy. In this project we propose to use a conversation reconstruction task to test models’ true comprehension and coherence levels. We will provide different models with a random permutation of conversation segments and optionally a hypothesis and whether or not it is entailed by the conversation. We will then test the model’s accuracy in reconstructing the original conversation order, while adjusting the loss calculations based on binary cross-entropy loss or based on the mean squared error determined from the Manhattan distance between the model’s prediction and the ground truth to see which yields better results.
Archival Option: Yes
Submission Number: 7
Loading