Assessing Coherence via Dialogue Reconstruction Tasks

Jai Narayanan; Edwin Chan; Shruti Jain

Assessing Coherence via Dialogue Reconstruction Tasks

Jai Narayanan, Edwin Chan, Shruti Jain

Published: 17 Dec 2024, Last Modified: 17 Dec 2024UMich CSE595 NLP FA2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: entailment, text-classifier

TL;DR: Using dialogue reconstruction tasks to measure the conversational coherence of language models

Abstract: Although many language models have high accuracy on language comprehension tasks, their true coherence levels remain low — the models often rely on spurious correlations in their input text to achieve their level of accuracy. In this project we propose to use a conversation reconstruction task to test models’ true comprehension and coherence levels. We will provide different models with a random permutation of conversation segments and optionally a hypothesis and whether or not it is entailed by the conversation. We will then test the model’s accuracy in reconstructing the original conversation order, while adjusting the loss calculations based on binary cross-entropy loss or based on the mean squared error determined from the Manhattan distance between the model’s prediction and the ground truth to see which yields better results.

Archival Option: Yes

Submission Number: 7

Loading