Abstract: Falsification, whose aim is to detect unsafe behaviors of cyber-physical systems (CPS) that violate signal temporal logic (STL) specifications, has been actively investigated in the past decade. Although numerous falsification approaches have been proposed, the falsification community suffers from a shortage of benchmarks that hinders a thorough assessment of those falsification approaches. In this article, we bridge this gap by proposing an automated approach for generating falsification benchmarks. Our approach is data-driven: first, we generate different time-variant traces (acting as system output traces) that satisfy a given STL specification, and we associate these with corresponding system input traces; then, we use these input and output traces to train an LSTM model that generalizes them. These models can serve as benchmarks for assessing falsification approaches against the given specification. In the experimental evaluation, we validate the generated models by measuring their ability to differentiate the performance of different falsification approaches. Our generated models expose strengths and weaknesses of all the considered falsification approaches, which was not achieved by benchmarks currently used in the falsification community. These results demonstrate the usefulness of our approach and can potentially push forward subsequent research in falsification.
External IDs:dblp:journals/tcad/YanLZAZ25
Loading