Abstract: News event discovery refers to the identification and detection of news events using multimodal data on social media. Currently, most works assume that the test set consists of known events. However, in real life, the emergence of new events is more frequent, which invalidates this assumption. In this paper, we propose a Dynamic Augmentation and Entropy Optimization (DAEO) model to address the scenario of generalized news event discovery, which requires the model to not only identify known events but also distinguish various new events. Specifically, we first introduce a multimodal augmentation module, which utilizes adversarial learning to enhance the multimodal representation capability. Secondly, we design an adaptive entropy optimization strategy combined with a self-distillation method, which uses multi-view pseudo-label consistency to improve the model's performance on both known and new events. In addition, we collect a multimodal news event discovery (MNED) dataset of 161,350 samples annotated with 66 real-world events. Extensive experimental results on the MNED dataset demonstrate the effectiveness of our proposed method. Our dataset is available on https://anonymous.4open.science/r/2FF5.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Content] Vision and Language
Relevance To Conference: This work presents a significant contribution to the field of multimedia applications by addressing a crucial challenge in news event discovery: the ability to adaptively recognize and categorize both previously known and newly emerging news events using multimodal data. The Dynamic Augmentation and Entropy Optimization (DAEO) model introduced in this paper innovatively incorporates multimodal data sources, which is pivotal for multimedia conferences focusing on the integration and innovative utilization of varied data types. The effectiveness of our approach is validated by extensive experiments on the newly compiled Multimodal News Event Discovery (MNED) dataset, which is a substantial contribution to the multimedia community, offering a rich dataset for future research in this area. This aligns directly with the conference’s focus on multimodal processing innovations and their real-world applications.
Supplementary Material: zip
Submission Number: 2811
Loading