UniDE: A multi-level and low-resource framework for automatic dialogue evaluation via LLM-based data augmentation and multitask learning
Abstract: Highlights•A unified dialogue evaluation framework with diverse-level measurement dimensions.•A small evaluator outperforms GPT-4 evaluator, highlighting low-resource efficiency.•A feasible NLP paradigm is to train small LLMs on high-quality data from large LLMs.•Extensive experimental results on popular datasets validate the superiority of UniDE.
Loading