LLM-as-an-Explainer: Evaluating and Aligning LLM-generated Explanations for Scientific Concepts

LLM-as-an-Explainer: Evaluating and Aligning LLM-generated Explanations for Scientific Concepts

ACL ARR 2026 January Submission7975 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Concept Explanation, Resources and Evaluation, Alignment

Abstract: People are increasingly using Large Language Models (LLMs) to explain unfamiliar scientific concepts. However, it is unclear whether LLM-generated explanations are accurate, clear, and useful. In this paper, we investigate \textit{LLM-as-an-explainer} by (1) evaluating the quality of LLM-generated concept explanations, and (2) aligning open-source LLMs to produce \textit{high-quality} concept explanations. In particular, we collect a large-scale dataset of 31,160 explanations generated by ten LLMs covering concepts from six disciplines, including Social Science, Biomedical Science, Mental Health, Computer Science, Law and Policy, and Finance. Next, we design a principle-guided evaluation framework that systematically assesses the quality of LLM-generated explanations. Our human validation shows substantial agreement between the proposed evaluation framework and human results. Finally, we propose \textsc{ExpDPO} to align lightweight LLMs by learning from multi-level \textit{good} and \textit{bad} paired concept explanations. Experiments show that the aligned LLMs can outperform their larger variants on this task.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: corpus creation, benchmarking, automatic creation and evaluation of language resources, NLP datasets, automatic evaluation of datasets, evaluation methodologies

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Data resources, Data analysis

Languages Studied: English

Submission Number: 7975

Loading