Keywords: Grounding, Natural Language Grounding, Temporal Logic, Linear Temporal Logic, NL-to-TL, Verification, Translation, Formalization
TL;DR: We introduce GinSign, a framework for grounding NL into system signatures for TL translation. By framing grounding as structured classification, GinSign outperforms prior work with 95.5% grounded equivalence and generalizes across domains.
Abstract: Natural language (NL) to temporal logic (TL) translation enables engineers to specify, verify, and enforce system behaviors without manually crafting formal specifications—an essential capability for building trustworthy autonomous systems. While existing NL–to-TL translation frameworks have demonstrated encouraging initial results, these systems either explicitly assume access to accurate atom grounding or suffer from low grounded translation accuracy. In this paper, we propose a framework for Grounding Natural Language Into System Signatures for Temporal Logic translation called GinSign. The framework introduces a grounding model that learns the abstract task of mapping NL spans onto a given system signature: given a lifted NL specification and a system signature $\mathcal{S}$, the classifier must assign each lifted atomic proposition to an element of the set of signature-defined atoms $\mathcal{P}$. We decompose the grounding task hierarchically—first predicting predicate labels, then selecting the appropriately typed constant arguments. Decomposing this task from a free-form generation problem into a structured classification problem permits the use of smaller masked language models and eliminates the reliance on expensive LLMs. Moreover, since the grounding is captured as an abstract task without hard-coding the state space, our approach can generalize to new (or modified) state spaces without retraining. Experiments across multiple domains show that frameworks which omit grounding tend to produce syntactically correct lifted LTL that is semantically nonequivalent to grounded target expressions, whereas our framework supports downstream model checking and achieves grounded logical-equivalence scores of 95.5%, a $1.4\times$ improvement over SOTA.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 20652
Loading