Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: Chemical Process Engineering, Process Flow Diagrams, Piping and Instrumentation Diagrams, Small Language Models, Retrieval-Augmented Generation, Physics-Aware Simulation, Inference Optimization (FlashAttention, Lookahead Decoding, PagedAttention), Process Scale-Up
Abstract: Recent advances in generative AI have accelerated the discovery of novel chemicals and materials. However, scaling these discoveries to industrial production remains a major bottleneck due to the synthesis gap---the need to develop entirely new manufacturing processes. This challenge requires detailed engineering blueprints: Process Flow Diagrams (PFDs) for equipment layouts and material/energy flows, and Piping and Instrumentation Diagrams (PIDs) for process plant operations. Current AI systems cannot yet reliably generate these critical engineering schematics, creating a fundamental obstacle to manufacturing scale-up of novel discoveries. We present a closed-loop, physics-aware framework for automated generation of industrially viable PFDs and PIDs. The framework integrates three key components: (1) domain-specialized small language models (SLMs) trained for auto-generation of PFDs and PIDs, (2) a hierarchical knowledge graph containing process flow and instrumentation descriptions for 1,020+ chemicals for Graph Retrieval-Augmented Generation (GRAG), and (3) an open-source chemical process simulator for modeling, simulation, optimization, and analysis of novel chemical processes. The SLMs are trained through a multi-stage pipeline combining Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Retrieval-Augmented Instruction Tuning (RAIT) on synthetic datasets, with process simulator-in-the-loop validation ensuring feasibility. To enhance computational efficiency, the framework implements structural pruning (width and depth) guided by importance heuristics to reduce language model size while preserving accuracy, followed by advanced inference optimizations including FlashAttention, Lookahead Decoding, PagedAttention with KV-cache quantization, and Test-Time Inference Scaling. Experimental results demonstrate that our framework generates simulator-validated process descriptions with high fidelity, outperforms baseline methods in correctness, and generalizes effectively to unseen chemicals. By bridging AI-driven molecular and material design with industrial-scale feasibility, this work significantly accelerates the path-to-production for AI-discovered chemicals.
Submission Number: 91
Loading