PRESTO: A Framework for Orchestrating System States and Test Cases for Bash Script Verification

ICLR 2026 Conference Submission16323 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bash Script Generation, Code Verification
Abstract: Bash is a widely used scripting language for automating system and cloud tasks, but its reliance on implicit preconditions—such as environment variables, file paths, and tool availability—makes it error-prone, especially when scripts are generated by large language models (LLMs). While LLMs have demonstrated promising capabilities in translating natural language to Bash scripts, the lack of reliable evaluation methods and test coverage hampers their practical utility. We introduce PRESTO, a modular framework for Precondition-aware Script Testing and Orchestration, designed to assess and refine Bash scripts through execution-driven feedback loops. PRESTO automatically infers required preconditions, synthesizes minimal reproducible environments, generates targeted test cases, and evaluates the behavior of both LLM-generated and human-authored Bash scripts in a sandboxed execution environment. Upon failure, an iterative refinement cycle—driven by LLMs—updates the script, environment setup, or test harness until correctness is restored. Our experiments on two public benchmarks show that PRESTO significantly improves correctness, debugging efficiency and reliability compared to static or heuristic methods. Unlike reference-based metrics, PRESTO operates without requiring gold-standard references, making it suitable for real-world deployment scenarios. This positions PRESTO as a practical solution for production-ready script generation.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 16323
Loading