GazaVHR: AI-Driven Legally Grounded Conflict Harm Documentation

Published: 19 Jun 2025, Last Modified: 12 Jul 20254th Muslims in ML Workshop co-located with ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Submission Track: Track 1: Machine Learning Research by Muslim Authors
Keywords: Gaza, human rights violations, vision-language model, conflict imagery, image classification, dataset, semantic filtering, AI ethics, humanitarian computing, social media analysis
TL;DR: GazaVHR is a dataset of 4,603 AI-annotated Gaza conflict images, filtered from 176K via vision-language models and clustering.Aligned with expert-verified images, it enables scalable analysis of rights violations.
Abstract: We present GazaVHR, a vision-language model (VLM)-annotated dataset for fine-grained analysis of potential human rights violations in Gaza conflict imagery. Sourced from 145,662 conflict-related tweets, our pipeline integrates vision-language models, vision encoders, and semantic clustering to generate structured annotations with minimal manual intervention. Beginning with 176,731 raw images, a multi-stage filtering (content rules, deduplication, semantic clustering) identifies 13,834 visually unique instances that are most likely conflict-relevant. To ensure legal relevance, we align results with the Kanıt (Evidence) dataset: 231 expert-curated images grounded in the Rome Statute of the International Criminal Court (ICC Articles 5–8). This framework refines the dataset to 4,603 high-confidence images likely indicative of conflict-related harm. While our work highlights AI’s potential to systematize human rights documentation at scale, we acknowledge limitations in reduced manual oversight and biases inherent to LLM-based annotation and hashtag-driven social media data.
Submission Number: 19
Loading