Abstract: Identifying relationships between news articles in order to cluster them into real-
world events is a fundamental task for analyzing the news media ecosystem. Many
existing approaches rely primarily on semantic similarity, which can lead to incorrect
groupingswhenarticlessharesimilartopicsbutrefertodifferentevents. Inthiswork,
we propose a multi-signal graph-based pipeline that integrates several sources of in-
formationtobettermodelrelationshipsbetweennewsarticles. Usingagold-standard
dataset of worldwide news events, the proposed method extracts multiple similarity
signals, including semantic representations and entity-based information, which are
combined to compare articles and identify event-level relationships. Experimental
evaluation demonstrates that the proposed pipeline significantly improves clustering
performance compared to a semantic similarity baseline and traditional approaches
to this task. The method achieves 94.7% homogeneity and 85.8% completeness while
maintaining 89.4% article coverage in the final clusters. These results indicate that
combining multiple signals enables more accurate identification of relationships be-
tween articles, leading to more reliable clustering of news into meaningful real-world
events.
Loading