S-Evaluator: Enhance Factual Consistency Evaluator with Adversarial Data Synthesized by Large Language Model

Junnan Liu, Wenlong Du, Qingquan Li, Xuewei Wang, Zhongjun Zhou, Jin Liu

Published: 2024, Last Modified: 30 Sept 2024ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the rapid development of LLMs, the evaluation of factual consistency between source documents and generated texts plays a more crucial role in natural language generation (NLG). Recent methods usually suffer from low quality and insufficient quantity of training data. In this paper, we propose a method for synthesizing factual consistency data by harnessing the vast knowledge stored within large language models. The synthetic data produced through this approach demonstrates notable discriminative abilities and robustness for training the factual consistency evaluation model. We adopt experiments on two benchmark datasets (TRUE and SummaC) and our method achieves 3.5% and 4.6% relative improvement on AUC-ROC metric respectively.