From Rules to Pixels: A Decoupled Framework for Segmenting Human-Centric Rule Violations

Published: 24 Nov 2025, Last Modified: 24 Nov 20255th Muslims in ML Workshop co-located with NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Policy Violation, segmentation, llm, referring expression segmentation, text-driven segmentation
TL;DR: LaGPS uses a large language model to translate complex human policies into a simple, step-by-step program that a vision model can execute for precise, explainable image segmentation.
Abstract: We introduce LaGPS, a neuro-symbolic framework that grounds long-form textual rules, such as cultural dress codes, by translating them into deterministic programs for segmentation of rule violations\footnote{Here, "violation" is used in a strictly technical sense to denote pixels where a *user-specified* visual condition is not met; it carries no moral, cultural, or legal implication.}. Existing vision-language models struggle with this task because they cannot parse the compositional logic inherent in human rules. LaGPS overcomes this limitation with a two-stage architecture: a *Semantic Interpreter* that uses a large language model to compile free-form text into a structured program, and a *Symbolic Executor* that runs this program over a set of visual primitives (e.g., per-person body parts, skin masks, etc) to produce precise segmentation masks. To evaluate this setting, we introduce the *Human-Centric Rule-violation Segmentation (HRS)* benchmark for this task, a new $1,100$ image dataset spanning diverse cultural contexts. LaGPS significantly outperforms baselines like CLIPSeg, achieving a $+19.4\%$ absolute mIoU improvement. Our work demonstrates that this decoupled approach creates more transparent, accurate, and auditable systems for language-guided visual reasoning.
Track: Track 1: ML on Islamic Content / ML for Muslim Communities
Submission Number: 52
Loading