Generating Behavior-Driven Development (BDD) Artifacts

16 Feb 2026 (modified: 02 Apr 2026)Submitted to AIware 2026EveryoneRevisionsCC BY 4.0
Keywords: Behavior-Driven Development (BDD), Large Language Models
Abstract: Behavior-Driven Development (BDD) specifies system behavior through scenarios, which can serve both as executable specifications and as automated test cases. In practice, BDD scenarios are created alongside large volumes of semi-structured or unstructured records (e.g., textual documentation, issue discussions, and informal feature descriptions). As a result, generating BDD scenarios and their accompanying records is often labor-intensive and error-prone. This paper investigates bidirectional generation between semi-structured or unstructured textual records and structured BDD scenarios using large language models. We conduct a comparison between (i) fine-tuning of CodeT5+ encoder-decoder models on aligned (record, scenario) and (scenario, record) pairs, and (ii) retrieval-augmented few-shot prompting with Meta-Llama-3.1-8B, CodeT5+ models, and DeepSeek-Coder. Experiments on a curated dataset of 2,100 aligned pairs show that generation quality is influenced by task direction and context management strategy, and that prefixing benefits tasks requiring strict structural output. Fine-tuned models achieve strong record-generation performance, with best BLEU/F1 of 0.9394/0.9549 (CodeT5p-770m, unprefixed-truncating) and best Exact Match of 0.8119 (CodeT5p-770m, prefixed-summarizing).
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public.
Paper Type: Full-length papers (i.e. case studies, theoretical, applied research papers). 8 pages
Reroute: false
Submission Number: 46
Loading