Context Is The Key For LLM-Based Text Segmentation

ACL ARR 2024 December Submission1184 Authors

15 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Text Segmentation involves dividing text into coherent sections, typically defined by topics. Over the past decade, lots of research has gone into furthering the development of supervised techniques to approach TS tasks, which has largely left unsupervised TS techniques with less advancement. With the onset of Large Language Models and the accessibility of them becoming more commonplace, unsupervised TS can benefit. By leveraging an LLM's strong understanding of natural language, prompting appropriately, and feeding in valuable context, we show that, even with locally run, open source LLM models, we can achieve state-of-the-art unsupervised TS results as benchmarked by Pk and WindowDiff scores.
Paper Type: Long
Research Area: Semantics: Lexical and Sentence-Level
Research Area Keywords: Natural Language Processing, Text Segmentation, LLMs, Natural Language Understanding, Unsupervised Text Segmentation
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English
Submission Number: 1184
Loading