Keywords: Industrial Control Systems, Programmable Logic Controllers, Data Augmentation, Code Generation
Abstract: Recent advances in LLMs and industrial copilots (e.g., from Siemens, Rockwell and Schneider) have the potential to transform the way control engineers program. Generating verifiable industrial code (e.g., free from both syntactic and logical errors) via LLMs remains inherently challenging, however, due to strict safety constraints and the intolerance for logical failures. The closed-source nature and scarcity of data for Industrial Control System (ICS) programming tasks exacerbate this difficulty, preventing LLMs from reaching their transformative potential in ICSs.To address this critical gap, we introduce PLC-Spec-Syn, the first evolutionary framework to generate high-fidelity PLC programming tasks. Each task consists of a detailed specification—a structured, natural language engineering document—and its corresponding verified PLC code.The core idea is to guide LLM-based task generation (specification–code pair) with practical industrial engineering principles through a multi-axis evolutionary process considering six dimensions:functionality, safety, performance, maintenance, interoperability, and contextual complication.To ensure data quality, each generated specification–code pair will undergo rigorous auditing including compilation check and formal verification of semantic consistency between the specification and the code.The whole process yields PLC-Spec-Code, the first large-scale corpus of 11,669 PLC programming tasks with strict quality control.Besides, PLC-Spec-Code has 84.3% syntactic diversity, substantially exceeding that of existing corpus like OSCAT (29.2%). Importantly, fine-tuning multiple (code) LLMs using our corpus improves their performance on verifiable PLC code generation in unseen tasks by an average of 16.4% compared to the previous models, confirming the effectiveness of our task generation approach and the practical usefulness of our corpus.
Primary Area: datasets and benchmarks
Submission Number: 15685
Loading