Synergizing Unsupervised Episode Detection with LLMs for Large-Scale News Events

ACL ARR 2024 December Submission1582 Authors

16 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: State-of-the-art automatic event detection struggles with interpretability and adaptability to evolving large-scale key events, unlike episodic structures, which excel in these areas. Often overlooked, episodes represent cohesive clusters of core entities (e.g., protesters, police) performing actions at a specific time and location. Each key event can be represented as a partially ordered sequence of episodes. This paper introduces a novel task, **episode detection**, which identifies episodes within a news corpus of key event articles. Detecting episodes poses unique challenges, as they lack explicit temporal or locational markers and cannot be merged using semantic similarity alone. While large language models (LLMs) can aid with these reasoning difficulties, they suffer with long contexts typical of news corpora. To address these challenges, we introduce **EpiMine**, an unsupervised framework that identifies a key event's candidate episodes by leveraging natural episodic partitions in articles, estimated through shifts in discriminative term combinations. These candidate episodes are more cohesive and representative of true episodes, synergizing with LLMs to better interpret and refine them into final episodes. We apply EpiMine to our three diverse, real-world event datasets annotated at the episode level, where it achieves a 59.2\% average improvement across all metrics compared to baselines.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: event extraction, document-level extraction, zero/few-shot extraction
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 1582
Loading