ExpertGenQA: Open-ended QA generation in Specialized Domains

ACL ARR 2025 February Submission5649 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Generating high-quality question-answer pairs for specialized technical domains remains challenging, with existing approaches facing a tradeoff between leveraging expert examples and achieving topical diversity. We present ExpertGenQA, a protocol that combines few-shot learning with structured topic and style categorization to generate comprehensive domain-specific QA pairs. Using U.S. Federal Railroad Administration documents as a test bed, we demonstrate that ExpertGenQA achieves twice the efficiency of baseline few-shot approaches while maintaining $94.4$% topic coverage. Through systematic evaluation, we show that current LLM-based judges and reward models exhibit strong bias toward superficial writing styles rather than content quality. Our analysis using Bloom's Taxonomy reveals that ExpertGenQA better preserves the cognitive complexity distribution of expert-written questions compared to template-based approaches. When used to train retrieval models, our generated queries improve top-1 accuracy by $13.02$% over baseline performance, demonstrating their effectiveness for downstream applications in technical domains.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: question generation, knowledge base QA, semantic parsing, interpretability; generalization, reasoning, few-shot QA
Contribution Types: Approaches to low-resource settings, Data resources, Data analysis
Languages Studied: English
Submission Number: 5649
Loading