Text2Stories: Evaluating the Alignment Between Stakeholder Interviews and User Stories

Text2Stories: Evaluating the Alignment Between Stakeholder Interviews and User Stories

ACL ARR 2026 January Submission5732 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Requirements Engineering, User Story Evaluation, Text Alignment, Source-Grounded Evaluation, LLM-as-a-Judge

Abstract: Software requirements are derived from natural language inputs such as the transcripts of elicitation interviews. However, evaluating whether those derived requirements faithfully reflect the stakeholders’ needs remains a challenging manual task. We introduce Text2Stories, a task and metrics for text-to-story alignment that allow quantifying the extent to which requirements (in the form of user stories) match the actual needs expressed by the elicitation session participants. Given an interview transcript and a set of user stories, our metric quantifies (i) correctness: the proportion of stories supported by the transcript, and (ii) completeness: the proportion of transcript supported by at least one story. We segment the transcript into text chunks and instantiate the alignment as a matching problem between chunks and stories. Experiments over four datasets show that an LLM-based matcher achieves 0.86 macro-F1 on manually labeled chunk–story pairs, while embedding models alone remain behind but enable effective blocking. Automated alignment decisions enable the downstream computation of completeness and correctness. We show that these two metrics behave consistently with human judgment across multiple datasets, positioning Text2Stories as a scalable, source-faithful complement to existing user-story quality criteria.

Paper Type: Long

Research Area: Semantics: Lexical, Sentence-level Semantics, Textual Inference and Other areas

Research Area Keywords: word/phrase alignment, textual entailment, semantic textual similarity, phrase/sentence embedding

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

Submission Number: 5732

Loading