HearingQA: A Dataset of Question and Answers in U.S. Committee Hearings

ACL ARR 2025 May Submission6641 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Existing literature on the quality of questions and answers focus on factual question answering and natural language understanding. However, there are no large scale datasets enabling the study of question and answer quality in public hearings and interview like settings. One challenge for constructing such a dataset is that public hearings and interviews are typically only accessible as unstructured transcripts in plain text. We develop a pipeline to extract utterances and identify utterances as questions and answers, which can then be used for downstream tasks of studying question and answer quality along different dimensions. Using this pipeline, we build a novel dataset constructed from committee hearings from the U.S. House of Representatives and Senate consisting of all questions and answers in transcripts from the 108th to the 117th congressional sessions. We find that it is possible to accurately distinguish the party affiliations of the questioners based on the question utterances alone indicating that committee members with different party affiliations use language differently when asking questions in committee hearings.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Dialogue and Interactive Systems,Discourse and Pragmatics,Information Extraction
Contribution Types: Data resources
Languages Studied: English
Submission Number: 6641
Loading