SketchMind:  A Multi-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches

Ehsan Latif; Zirak Khan; Xiaoming Zhai

SketchMind: A Multi-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches

Ehsan Latif, Zirak Khan, Xiaoming Zhai

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Educational AI, Multimodal Learning, Scientific Reasoning, Cognitive Modeling, Sketch-Based Assessment

TL;DR: Students' drawn scientific sketch evaluation with cognitive theory based multi-agent framework

Abstract: Scientific sketches (e.g., models) offer a powerful lens into students' conceptual understanding, yet AI-powered automated assessment of such free-form, visually diverse artifacts remains a critical challenge. Existing solutions often treat sketch evaluation as either an image classification task or monolithic vision-language models, which lack interpretability, pedagogical alignment, and adaptability across cognitive levels. To address these limitations, we present SketchMind, a cognitively grounded, multi-agent framework for evaluating and improving student-drawn scientific sketches. SketchMind introduces Sketch Reasoning Graphs (SRGs), semantic graph representations that embed domain concepts and Bloom's taxonomy-based cognitive labels. The system comprises modular agents responsible for rubric parsing, sketch perception, cognitive alignment, and iterative feedback with sketch modification, enabling personalized and transparent evaluation. We evaluate SketchMind on a curated dataset of 3,575 student-generated sketches across six science assessment items with different highest order of Bloom's level that require students to draw models to explain phenomena. Compared to baseline GPT-4o performance without SRG (average accuracy: 55.6%), the model with SRG integration achieves 77.1% average accuracy (+21.4% average absolute gain). We also demonstrate that multi-agent orchestration with SRG enhances SketchMind performance, for example, SketchMind with GPT-4.1 gains an average 8.9% increase in sketch prediction accuracy, outperforming single-agent pipelines across all items. Human evaluators rated the feedback and co-created sketches generated by SketchMind with GPT-4.1, which achieved an average of 4.1 out of 5, significantly higher than those of baseline models (e.g., 2.3 for GPT-4o). Experts noted the system’s potential to meaningfully support conceptual growth through guided revision. Our code and (pending approval) dataset will be released to support reproducibility and future research in AI-driven education.

Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)

Submission Number: 20898

Loading