Abstract: Abstract—Political event data has been increasingly important
for researchers to study and predict global events. Until recently
the majority of political events were hand-coded from text, limiting the timeliness and coverage of event data sets. Recent systems
have successfully employed big data systems for extracting events
from text. These automated event systems have been limited by
either the slow performance or high infrastructure demands. In
this work, we present a new approach to big data systems that
allow for faster extractions when compared to existing systems.
We describe a modular system, Biryani, that adaptively extracts
events from batches of documents. We use distributed containers
to process streams of incoming documents. The number of
containers processing documents can be increased or reduced
depending on the number of available resources. The optimal
configuration for event extraction is learned, and the system
adapts to maximize the throughput of coded documents. We
show the adaptability through experiments running on laptops
and multiple commodity machines. We use this system to extract
a new political event data set from several terabytes of text data.
0 Replies
Loading