Generate, Then Refine: Dual-Stage Verification-Guided Prompting for Mermaid Code Generation from Flowcharts

ACL ARR 2026 January Submission5560 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision–Language Models, Diagram Understanding, Structured Code Generation, Verification-Guided Prompting, Multimodal Reasoning, Flowchart-to-Code Translation
Abstract: Flowchart understanding has largely been evaluated through visual question answering (VQA), leaving structured diagram generation underexplored. We revisit FlowVQA and establish a new benchmark task: predicting executable \emph{Mermaid} code directly from flowchart images. We propose a dual-stage verification-guided prompting (DSVGP) framework for vision--language models (VLMs): an \emph{actor} that produces an initial Mermaid program and a \emph{critic} that validates and repairs it using Mermaid-aware checks and graph constraints. Coupled with visualization-aware verification and graph-centric parsing, our evaluation measures executability and structure via Micro F1 (label-with-connections), Parsing Success Rate (PSR), Normalized Edit Similarity (NES), and node-wise scores. Across diverse contemporary VLMs, the proposed actor--critic prompting yields \emph{consistent and significant} improvements over OCR \& Graph Parsing, single-prompt and two-step baselines, increasing both structural fidelity and executability of the generated code. These results indicate that VLMs can serve as robust diagram-to-code translators when guided by structured verification-driven prompting, and our benchmark provides a reproducible foundation for future research on flowchart-to-DSL generation.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Multimodality, Cross-modal content generation, Cross-modal application, Cross-modal information extraction
Contribution Types: NLP engineering experiment
Languages Studied: ENGLISH
Submission Number: 5560
Loading