LLM-as-an-Explainer: Evaluating and Aligning LLM-generated Explanations for Scientific Concepts

ACL ARR 2026 January Submission7975 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Concept Explanation, Resources and Evaluation, Alignment
Abstract: People are increasingly using Large Language Models (LLMs) to explain unfamiliar scientific concepts. However, it is unclear whether LLM-generated explanations are accurate, clear, and useful. In this paper, we investigate \textit{LLM-as-an-explainer} by (1) evaluating the quality of LLM-generated concept explanations, and (2) aligning open-source LLMs to produce \textit{high-quality} concept explanations. In particular, we collect a large-scale dataset of 31,160 explanations generated by ten LLMs covering concepts from six disciplines, including Social Science, Biomedical Science, Mental Health, Computer Science, Law and Policy, and Finance. Next, we design a principle-guided evaluation framework that systematically assesses the quality of LLM-generated explanations. Our human validation shows substantial agreement between the proposed evaluation framework and human results. Finally, we propose \textsc{ExpDPO} to align lightweight LLMs by learning from multi-level \textit{good} and \textit{bad} paired concept explanations. Experiments show that the aligned LLMs can outperform their larger variants on this task.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: corpus creation, benchmarking, automatic creation and evaluation of language resources, NLP datasets, automatic evaluation of datasets, evaluation methodologies
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Data resources, Data analysis
Languages Studied: English
Submission Number: 7975
Loading