HopWeaver: Synthesizing Authentic Multi-Hop Questions Across Text Corpora

HopWeaver: Synthesizing Authentic Multi-Hop Questions Across Text Corpora

ACL ARR 2025 July Submission206 Authors

25 Jul 2025 (modified: 22 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Multi-Hop Question Answering (MHQA) is crucial for evaluating the model's capability to integrate information from diverse sources. However, creating extensive and high-quality MHQA datasets is challenging: (i) manual annotation is expensive, and (ii) current synthesis methods often produce simplistic questions or require extensive manual guidance. This paper introduces HopWeaver, the first automatic framework synthesizing authentic multi-hop questions from unstructured text corpora without human intervention. HopWeaver synthesizes two types of multi-hop questions (bridge and comparison) using an innovative approach that identifies complementary documents across corpora. Its coherent pipeline constructs authentic reasoning paths that integrate information across multiple documents, ensuring synthesized questions necessitate authentic multi-hop reasoning. We further present a comprehensive system for evaluating synthesized multi-hop questions. Empirical evaluations demonstrate that the synthesized questions achieve comparable or superior quality to human-annotated datasets at a lower cost. Our approach is valuable for developing MHQA datasets in specialized domains with scarce annotated resources.

Paper Type: Long

Research Area: Generation

Research Area Keywords: retrieval-augmented generation, interactive and collaborative generation

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

Previous URL: https://openreview.net/forum?id=RYuFmE130I

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: Yes, I want a different area chair for our submission

Reassignment Request Reviewers: Yes, I want a different set of reviewers

Justification For Not Keeping Action Editor Or Reviewers: We respectfully request a new AC and reviewers. The previous review cycle revealed a fundamental misalignment on the paper's scope. One reviewer's feedback focused on experiments beyond our stated contributions, while another key perspective remained unaddressed despite our detailed rebuttal. The AC's meta-review largely repeated these initial views, assigning an aggregate score without addressing significant parts of our rebuttal. Given the limited feedback and potential for biased perspectives from this previous cycle, we believe a new evaluation is necessary for a fair assessment.

Software: zip

Data: zip

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: N/A

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Yes, we cite all creators of used artifacts. Citations for baseline datasets and models are provided in Section 5 and Appendix G, with full details in the References section.

B2 Discuss The License For Artifacts: No

B2 Elaboration: No, we did not discuss licenses as the artifacts used are standard academic benchmarks and models intended for research purposes.

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: Our use of baseline datasets for benchmarking, as shown in Section 5.1, is consistent with their intended use. The intended purpose of our created artifact, HopWeaver, is detailed in the Abstract and Introduction.

B4 Data Contains Personally Identifying Info Or Offensive Content: No

B4 Elaboration: No, this was not discussed as our work relies on the public English Wikipedia corpus (as stated in Appendix G), which primarily contains information about public figures and entities.

B5 Documentation Of Artifacts: N/A

B5 Elaboration: Yes, the paper serves as documentation for our framework. Section 3 details the design, Section 2 defines the question types, and Appendix G specifies the English Wikipedia data source.

B6 Statistics For Data: Yes

B6 Elaboration: Appendix A.

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Section 5, Appendix C and G.

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Appendix F and G.

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 5, Appendix B and E.

C4 Parameters For Packages: Yes

C4 Elaboration: Appendix G.

D Human Subjects Including Annotators: Yes

D1 Instructions Given To Participants: No

D1 Elaboration: The human validation was a small-scale study conducted with three Master's students in Computer Science who served as expert evaluators. The instructions consisted of a direct request to perform pairwise comparisons based on the detailed evaluation criteria already presented in the paper's appendix (Appendix E of the provided PDF). Given the evaluators' expertise and the straightforward nature of the task, a separate, lengthy instruction document was not created.

D2 Recruitment And Payment: No

D2 Elaboration: The three human evaluators were Master's students from our research group who participated as part of their academic research activities. They were not recruited through a formal process or crowdsourcing platform, and no monetary payment was provided for this specific validation task.

D3 Data Consent: No

D3 Elaboration: The participants were student co-authors and collaborators on this research project. They provided verbal consent to participate in the validation study. The purpose and use of their evaluation data within this paper were fully understood and agreed upon as part of the collaborative research process.

D4 Ethics Review Board Approval: No

D4 Elaboration: Formal ethics review board approval was not sought for this part of the study. The protocol involved a small number of expert evaluators (student collaborators) performing a low-risk, non-sensitive task of evaluating text quality. This type of internal validation activity generally does not require formal ethics board approval at our institution.

D5 Characteristics Of Annotators: Yes

D5 Elaboration: Yes, in Appendix B.2, we specify that the evaluators were "Three Master's students in Computer Science."

E Ai Assistants In Research Or Writing: No

E1 Information About Use Of Ai Assistants: N/A

Author Submission Checklist: yes

Submission Number: 206

Loading