Abstract: Technical support question-answering (QA) systems assist users in diagnosing and resolving technical issues, but ensuring their reliability remains a challenge. Existing QA systems may generate inaccurate responses due to LLM hallucinations and retrieval errors, which can lead to misleading guidance. A reliable evaluation framework is essential for systematically improving technical support QA systems, ensuring they generate accurate guidance. However, existing evaluation methods for QA systems struggle to precisely match key terms, and verify step order and completeness.To address these challenges, we propose TechSupportEval, an automated evaluation framework for technical support QA. Our framework introduces two novel techniques: (1) ClozeFact, which formulates fact verification as a cloze test and uses an LLM to fill in missing key terms to ensure precise key term matching, and (2) StepRestore, which shuffles ground truth steps and uses an LLM to reconstruct the actionable instructions in the correct order, verifying step order and completeness.To support comprehensive evaluation, we propose a benchmark dataset built upon the publicly available TechQA dataset, containing responses generated by different levels of QA systems. TechSupportEval achieves an AUC of 0.91, outperforming the state-of-the-art method by 7.6%. The code and dataset are available at https://github.com/NetManAIOps/TechSupportEval.
External IDs:dblp:conf/ijcnn/ChenSLXXPHNCYP25
Loading