Adaptive scalable pipelines for political event data generationDownload PDF

15 Feb 2023OpenReview Archive Direct UploadReaders: Everyone
Abstract: Abstract—Political event data has been increasingly important for researchers to study and predict global events. Until recently the majority of political events were hand-coded from text, limiting the timeliness and coverage of event data sets. Recent systems have successfully employed big data systems for extracting events from text. These automated event systems have been limited by either the slow performance or high infrastructure demands. In this work, we present a new approach to big data systems that allow for faster extractions when compared to existing systems. We describe a modular system, Biryani, that adaptively extracts events from batches of documents. We use distributed containers to process streams of incoming documents. The number of containers processing documents can be increased or reduced depending on the number of available resources. The optimal configuration for event extraction is learned, and the system adapts to maximize the throughput of coded documents. We show the adaptability through experiments running on laptops and multiple commodity machines. We use this system to extract a new political event data set from several terabytes of text data.
0 Replies

Loading