KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

Anonymous

KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

Abstract: The evaluation of large language models (LLMs) has attracted increasing attention. Existing approaches, including human, static dataset-based, and LLM-based evaluation, might face limitations such as data contamination, constrained generalizability, and high cost coupled with limited scalability. In this paper, we introduce the Knowledge-grounded Interactive Evaluation (KIEval), a novel approach to assessing instruction-tuned LLMs. Starting with a question in a conventional LLM benchmark involving domain-specific knowledge, KIEval exploits dynamically generated and knowledge-centric multi-round dialogues to mitigate data contamination and enhance the reliability of evaluations. The framework of KIEval is generalizable across various domains and tasks, yielding a scalable and cost-effective approach that can efficiently yet robustly assess knowledge generalization and generation capabilities of LLMs. With KIEval, we hope to bring new insights into evaluating LLMs effectively in conversation scenarios and how data contamination impacts LLMs' real-world performance.

Paper Type: long

Research Area: Resources and Evaluation

Languages Studied: English

0 Replies

Loading