DiagramDiff:A Diagram Reconstruction and Recognition Method to Enhance Large Language Models' Diagram Understanding
Keywords: Segmentation,Stroke Reconstruction,Recognition,Diagram,LLMs
TL;DR: We propose a Diagram Reconstruction and Recognition Method to Enhance Large Language Models' Diagram Understanding, and have constructed the first diagram-based question answering and editing dataset.
Abstract: Diagrams are widely used in daily life. However, offline diagrams typically exist in the form of images, lacking structured data representation, which significantly limits their reusability and editability. Current research mainly focuses on supporting basic query tasks for online diagrams and does not meet the semantic understanding and interaction requirements for complex offline diagrams. Although large language models (LLMs) possess powerful reasoning and knowledge integration capabilities, their performance in processing offline diagrams is unsatisfactory due to the inability to accurately understand the structure and content of offline diagrams. To address these issues,we propose DiagramDiff, a framework consisting of a high-precision diagram reconstruction model and an instance-level diagram element recognition model. The framework converts offline diagrams into standardized data structures, enabling LLMs to transition from being unable to understand offline diagrams to becoming intelligent assistants capable of performing tasks such as semantic reasoning, logical validation, and efficient diagram editing. We have constructed a dataset containing diagrams and their corresponding question and answering(Q&A) and editing tasks. Experiments demonstrate that DiagramDiff achieves state-of-the-art performance in diagram reconstruction and recognition tasks, significantly enhancing LLMs' understanding and interaction capabilities with offline diagrams.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 6853
Loading