- Keywords: social network analysis, natural language processing, time series analysis, information retrieval
- TL;DR: Detection of important events in news streams.
- Abstract: Detecting important events in high volume news streams is an important task for a variety of purposes. The volume and rate of online news increases the need for automated event detection methods that can operate in real time. In this paper we develop a network- based approach that makes the working assumption that important news events always involve named entities (such as persons, locations and organizations) that are linked in news articles. Our approach uses natural language processing techniques to detect these entities in a stream of news articles and then creates a time-stamped series of networks in which the detected entities are linked by co-occurrence in articles and sentences. In this prototype, weighted node degree is tracked over time and change-point detection used to locate important events. Potential events are characterized and distinguished using commu- nity detection on KeyGraphs that relate named entities and informative noun-phrases from related articles. We performed an evaluation against human annotation and against simple text-based clustering, finding that our system detects more events than simple clustering and compares well to human annotations.This methodology already produces promising re- sults and will be extended in future to include a wider variety of complex network analysis techniques.
- Archival status: Archival
- Subject areas: Natural Language Processing, Information Extraction