Keywords: LLMs, reasoning, multimodals, bias
TL;DR: LLM as reasoners
Abstract: Large vision–language models (LVLMs) can function as algorithmic annotators by not only assigning labels to multimodal inputs but also generating structured reasoning traces that justify those labels. We introduce \textbf{Reasoning-as-Annotation (RaA)}, a paradigm in which an LVLM outputs a human-interpretable rationale, calibrated confidence, and evidence pointers alongside each label, effectively acting as both classifier and explainer. We evaluate RaA on bias detection in images using a curated dataset of \~2,000 examples with human gold labels. Across closed- and open-source LVLMs, RaA preserves accuracy relative to black-box labeling while adding transparency: rationales were coherent and grounded in 75–90\% of cases, evidence pointers auditable in 70–85\%, and confidence scores correlated with correctness (\$r=0.60\$–\$0.76\$). These results show RaA is model-agnostic and maintains predictive quality while producing interpretable, auditable annotations. We position that RaA offers a scalable way to transform opaque labels into reusable reasoning traces for supervision and evaluation.
Track: Track 2: ML by Muslim Authors
Submission Number: 20
Loading