Beyond Words: A Topological Exploration of Coherence in Text Documents

Published: 19 Mar 2024, Last Modified: 17 Apr 2024Tiny Papers @ ICLR 2024 PresentEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Text Coherence, NLP, Attention Maps, Topological Data Analysis, Persistent Homology
TL;DR: We present a method of analyzing text coherence based on topological data analysis techniques applied on top of attention graphs generated for the text.
Abstract: Coherence serves as a pivotal metric in evaluating the quality of a text. It quantifies how well the sentences within the text are connected and how well the text is structured and organized. It plays a vital role in various downstream Natural Language Processing tasks such as text summarization, question answering and machine translation among others. In this work, we explore the use of topological data analysis (TDA) techniques on attention graphs of text documents to model coherence. TDA techniques are known to capture structural information and patterns in data, making it suitable for modeling the $\textit{structure}$ and $\textit{flow}$ of a document, i.e. coherence. We validate our approach with experiments on the GCDC dataset, achieving state-of-the-art results with a simple MLP.
Supplementary Material: zip
Submission Number: 210