Abstract: Flowcharts serve as integral visual aids, encapsulating both logical flows and specific component-level information in a manner easily interpretable by humans. However, automated parsing of these diagrams poses a significant challenge due to their intricate logical structure and text-rich nature. In this paper, we introduce GenFlowchart, a novel framework that employs generative AI to enhance the parsing and understanding of flowcharts. First, a cutting-edge segmentation model is deployed to delineate the various components and geometrical shapes within the flowchart using the Segment Anything Model (SAM). Second, Optical Character Recognition (OCR) is utilized to extract the text residing in each component for deeper functional comprehension. Finally, we formulate prompts using prompt engineering for the generative AI to integrate the segmented results and extracted text, thereby reconstructing the flowchart’s workflows. To validate the effectiveness of GenFlowchart, we evaluate its performance across multiple flowcharts and benchmark it against several baseline approaches. GenFlowchart is available at https://github.com/ResponsibleAILab/GenFlowchart.
Loading