Abstract: Large language models (LLMs) are increasingly used for document and code editing, yet standard approaches typically lack formal assurances about the scope, structure, or side effects of their modifications. We introduce \emph{Functional Safety}, a hierarchy-aware editing architecture that formalizes LLM-driven edits as typed plans over explicit hierarchies with deterministic execution. A stochastic planning stage operates on an explicit hierarchical representation and emits a structured plan of typed operations that separate structural reorganization from bounded content generation. We analyze each step with two footprints: a \emph{structural footprint} (nodes whose relations may change) and a \emph{payload footprint} (nodes whose local content may change), and the guarantees are scoped per step. Execution is performed by a deterministic, structure-constrained component that enforces locality, guards protected regions, preserves byte-for-byte payload outside each step's payload footprint, and confines structural changes to each step's structural footprint under the stated assumptions. We formalize the architecture, specify its invariants, and prove Deterministic Safety and Conditional Functional Safety theorems that bound side effects under those assumptions. Empirical evaluations on long-form document rewriting, code refactoring, and multi-page policy briefs show that Functional Safety substantially reduces side effects relative to ReAct-style tool agents reflecting current agentic editor practice. These results demonstrate that principles from functional programming—explicit structure, composability, and controlled side effects—provide a rigorous foundation for reliable LLM-driven editing.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: This is an improved submission which benefits from feedback received from three anonymous reviewers. This latest revision also increases the sample sizes by increasing the number of documents across all experiments. This addresses feedback from one of the reviewers. Table 2 has been improved to provide more clarity tradeoffs, and failure modes discussion has been rewritten to improve clarity as well.
Assigned Action Editor: ~Yinpeng_Dong2
Submission Number: 7873
Loading