Deducing Matching Strings for Real-World Regular Expressions

Published: 2023, Last Modified: 27 Jan 2026SETTA 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Real-world regular expressions (regexes for short) have a wide range of applications in software. However, the support for regexes in test generation is insufficient. For example, existing works lack support for some important features such as lookbehind, are not resilient to subtle semantic differences (such as partial/full matching), fall short of Unicode support, leading to loss of test coverage or missed bugs. To address these challenges, in this paper, we propose a novel semantic model for comprehensively modeling the extended features in regexes, with an awareness of different matching semantics (i.e. partial/full matching) and matching precedence (i.e. greedy/lazy matching). To the best of our knowledge, this is the first attempt to consider partial/full matching semantics in modeling and to support lookbehind. Leveraging this model we then develop PowerGen, a tool for deducing matching strings for regexes, which randomly generates matching strings from the input regex effectively. We evaluate PowerGen against nine related state-of-the-art tools. The evaluation results show the high effectiveness and efficiency of PowerGen.
Loading