MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation

MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation

ACL ARR 2025 May Submission632 Authors

14 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Existing RAG benchmarks often overlook query difficulty, leading to inflated performance on simpler questions and unreliable evaluations. A robust benchmark dataset must satisfy three key criteria: quality, ensuring complete and reliable ground truth (GT) responses; diversity, expanding semantic coverage to prevent overfitting; and difficulty, capturing the complexity of reasoning based on hops and the distribution of supporting evidence. However, current benchmarks lack a systematic approach to defining and controlling query difficulty at a fine-grained level. To address this, we propose MHTS (Multi-Hop Tree Structure), a novel dataset synthesis framework that systematically controls multi-hop reasoning complexity by leveraging a multi-hop tree structure to generate logically connected, multi-chunk queries. Our fine-grained difficulty estimation formula exhibits a strong correlation with the overall performance metrics of a RAG system, validating its effectiveness in assessing both retrieval and answer generation capabilities. By ensuring high-quality, diverse, and difficulty-controlled queries, our approach enhances RAG evaluation and benchmarking capabilities. This work contributes to the development of more reliable, efficient, and adaptable AI-driven research assistants, facilitating advancements in document-based reasoning and retrieval tasks.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: evaluation methodologies, evaluation

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data analysis

Languages Studied: English

Submission Number: 632

Loading