[Work-in-Progress] Multi-Instance Learning for Social Media- Based Spatiotemporal Public Opinion Analysis

Shanshan Bai; Anna Kruspe; Xiao Xiang Zhu

[Work-in-Progress] Multi-Instance Learning for Social Media- Based Spatiotemporal Public Opinion Analysis

Shanshan Bai, Anna Kruspe, Xiao Xiang Zhu

Published: 26 Jul 2025, Last Modified: 06 Oct 2025NLPOR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-instance learning, Social Media, Public opinion

Submission Type: Non-Archival

Abstract: This work-in-progress explores applying the framework of multi-instance learning (MIL) to spatiotemporally grounded public opinion analysis using social media data. While traditional surveys offer depth and precision, social media provides a scalable, cost-effective complement for real-time tracking of public sentiment. However, weak supervision in data collection often results in a large volume of ambiguous or uninformative posts, complicating both prediction accuracy and interpretability. We address these challenges by framing public opinion analysis as a MIL task, where social media posts (instances) are grouped into bags based on shared spatial (e.g., city, region) or temporal (e.g., daily, weekly intervals) attributes. This formulation supports learning both at the bag level (e.g., tracking how opinion shifts over time or across locations) and at the instance level (e.g., identifying specific posts that drive a shift or reflect conflicting viewpoints). In recently completed but unpublished work, we treated geo-tagged tweets from specific buildings as instances and used non-deep MIL models to infer building functionality. That study demonstrated MIL’s ability to handle noisy data and model rare or underrepresented classes. Building on this, we are developing a more robust MIL framework aimed at public opinion modeling. Drawing on established use cases of MIL in computer vision (e.g., tumor region identification) and NLP (e.g., document-level sentiment and relation extraction), we define bags by shared spatiotemporal and demographic features and pursue two core objectives: Implicit Noise Handling: MIL enables the model to learn directly from weakly labeled data by distinguishing informative from uninformative instances without explicit filtering. Interpretability via Instance Scoring: By modeling both the bag and its constituent instances, the framework reveals which posts contribute to opinion dynamics or internal disagreement in a region or time window. While our current work focuses on developing the MIL framework and evaluating its suitability for spatiotemporal opinion modeling and interpretability, we acknowledge that selecting a specific public opinion task, dataset, and labeling strategy is essential for empirical validation. To that end, we are currently surveying existing social media datasets with geo-temporal metadata (e.g., Twitter, Reddit) and exploring options for weak labeling. Our aim is to apply this framework to a real-world public opinion case study, enhancing the accountability, transparency, and actionability of models trained on noisy, weakly supervised social media data.

Submission Number: 14

Loading