Abstract: Manual analysis of diagrams and legend sheets in engineering projects is time consuming and needs automation. The lack of standardized legend formats complicates creating a general method for automated information extraction. Existing approaches require training and custom rules for each project. This study proposes a novel solution combining optical character recognition with vision language models and multimodal prompt engineering to automate information extraction from diverse legend sheets without training. It integrates legend information with information extracted from diagrams, unlike studies that only focus on diagrams. Our study shows that VLMs, guided by multimodal prompts, can accurately extract information from diverse legend sheets, enabling automatic information extraction in diagrams across engineering projects. We validate our method through a case study involving the extraction of instruments from piping and instrumentation diagrams (P&IDs) and their legends across three projects with varied formats and standards. The proposed method achieved 100% accuracy in legend classification and information extraction, and 99.68% precision and 95.91% recall in generating instrument listings. The results demonstrate the effectiveness of our approach, significantly enhancing the accuracy and efficiency of information extraction from diagrams. This method can be adapted to different legend formats and diagrams, providing a versatile solution for various industries.
External IDs:doi:10.1002/smr.70072
Loading