LiveLongBench: Tackling Long-Context Understanding for Spoken Texts from Live Streams

LiveLongBench: Tackling Long-Context Understanding for Spoken Texts from Live Streams

ACL ARR 2025 February Submission5965 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Long-context understanding poses significant challenges in natural language processing, particularly for real-world dialogues characterized by speech-based elements, high redundancy, and uneven information density. Although large language models (LLMs) achieve impressive results on existing benchmarks, these datasets fail to reflect the complexities of such texts, limiting their applicability to practical scenarios. To bridge this gap, we construct the first spoken long-text dataset, derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-world scenarios. We design tasks across three main categories—retrieval-dependent, reasoning-dependent, and hybrid—and evaluate both popular LLMs and specialized methods for their ability to understand long-contexts in these tasks. Our results reveal that current methods struggle to effectively process highly redundant texts, with clear preferences for specific task types but no single method excelling across all tasks. Based on our findings, we propose a simple yet strong baseline that addresses these challenges, achieving substantial improvements in performance. Our analysis offers valuable insights into the strengths and limitations of existing methods for processing spoken texts, laying the groundwork for advancing long-text understanding in real-world applications. As the first benchmark specifically designed for spoken long-text understanding, it not only tackles key challenges in this domain but also serves as a valuable resource for driving innovation in e-commerce applications.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Long-context understanding, spoken language, KV cache compression

Contribution Types: Data resources, Data analysis

Languages Studied: English

Submission Number: 5965

Loading