Data Generation for Policy-Grounded Stance Detection using Sparse Supervision

ACL ARR 2025 May Submission5892 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We propose a joint classification task to identify both the relevant Sustainable Development Goals (SDGs) and the stance—Supportive, Contrary, or Neutral—a text expresses toward each goal. To address the lack of labeled data, we generate a synthetic training corpus by prompting GPT-4o-mini with expanded and contrastive versions of the 169 official SDG targets. We train a RoBERTa-based model using a semi-supervised objective that combines cross-entropy with a KL divergence term encouraging calibrated stance distributions under a neutrality-biased Dirichlet prior. Evaluated on two human-annotated benchmarks—academic texts from the OSDG dataset and policy bullet points from the 2024 UN SDG Progress Report—our model outperforms sentence-transformer baselines adapted for zero-shot stance inference. Qualitative analysis reveals plausible reasoning patterns and generalization across domains, though the model tends to overpredict \textsc{Neutral} in ambiguous cases. Our results suggest that structured generation from policy targets can support scalable alignment models even under partial supervision. We release code, data, and evaluation tools to facilitate future work.
Paper Type: Long
Research Area: Special Theme (conference specific)
Research Area Keywords: NLP for Social Good, Semi-Supervised Learning, Stance Detection and Argument Mining, Normative or Ethical Reasoning, Text Generation
Contribution Types: Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Keywords: NLP for Social Good, Semi-Supervised Learning, Stance Detection and Argument Mining, Normative or Ethical Reasoning, Text Generation
Submission Number: 5892
Loading