Beyond Factual QA: Mentorship-Oriented Question Answering from Long-Form Multilingual Content

Beyond Factual QA: Mentorship-Oriented Question Answering from Long-Form Multilingual Content

ACL ARR 2026 January Submission1049 Authors

27 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: mentorship-focused QA, multilingual question answering, long-form content understanding, educational AI, multi-agent, LLM-based evaluation, low-resource languages

Abstract: Question answering systems are typically evaluated on factual correctness, yet many real-world applications—such as education and career guidance—require mentorship: responses that provide reflection and guidance. Existing QA benchmarks rarely capture this distinction, particularly in multilingual and long-form settings. We introduce MentorQA, the first multilingual dataset and evaluation framework for mentorship-focused question answering from long-form videos, comprising nearly 9,000 QA pairs from 180 hours of content across four languages. We define mentorship-focused evaluation dimensions that go beyond factual accuracy, capturing clarity, alignment, and learning value. Using MentorQA, we compare Single-Agent, Dual-Agent, RAG, and Multi-Agent QA architectures under controlled conditions. Multi-Agent pipelines consistently produce higher-quality mentorship responses, with especially strong gains for complex topics and lower-resource languages. We further analyze the reliability of automated LLM-based evaluation, observing substantial variation in alignment with human judgments. Overall, this work establishes mentorship-focused QA as a distinct research problem and provides a multilingual benchmark for studying agentic architectures and evaluation design in educational AI. The dataset and evaluation framework are released at https://anonymous.4open.science/r/MentorQA/.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: commonsense QA; reading comprehension; logical reasoning;open-domain QA; question generation

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: english,hindi,chinese,romanian

Submission Number: 1049

Loading