Abstract: Coding, the method of labeling and organizing qualitative data, is commonly used in social science studies. For example, to provide an understanding of political violence around the world, an international non-profit group, ACLED (Armed Conflict Location and Event Data), has been collecting and coding reports of protests and conflicts for over a decade. Using this high-quality manually collected data, we create ACLED-DS, a dataset of 45,426 armed conflict events spanning 22 languages and 172 countries, with extensive coverage of region-specific entities.
Building on this real-world dataset, we motivate a modification to the traditional event extraction task. We propose the task of abstractive event extraction (AEE) and entity linking: events are extracted by a holistic understanding of the entire document, and all event arguments are normalized. This formulation simplifies applications such as aggregating information from diverse sources in different languages to understand global trends and patterns.
We introduce a novel zero-shot AEE system Zest based on large language models. On ACLED-DS, Zest achieves 77.6% and 82.9% $F_1$ on event detection and abstractive event argument extraction respectively. Zest outperforms GoLLIE, a state-of-the-art information extraction model, even when tested on English data and after GoLLIE is fine-tuned on 12,000 examples. For the event linking subtask, Zest achieves 40.8% compared to 11.0% of OneNet, a strong LLM-based baseline. While these results establish Zest as a strong baseline for this new dataset, they also highlight important challenges in entity linking within this global and highly specialized context.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: event extraction, entity linking, document-level extraction, multilingual extraction, zero-shot extraction
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Data resources
Languages Studied: English, Arabic, Spanish, Portuguese, Turkish, Burmese, Korean, French, German, Indonesian, Italian, Persian, Ukrainian, Russian, Somali, Nepali, Hebrew, Chinese, Polish, Dutch, Hindi, Japanese.
Submission Number: 2380
Loading