Efficient Citer: Tuning Large Language Models for Enhanced Answer Quality and Verification

Marzieh S. Tahaei, Aref Jafari, Ahmad Rashid, David Alfonso-Hermelo, Khalil Bibi, Yimeng Wu, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh

Published: 2024, Last Modified: 25 Jan 2025NAACL-HLT (Findings) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In recent years, there has been a growing interest in utilizing external knowledge to reduce hallucinations in large language models (LLMs) and provide them with updated information. Despite this improvement, a major challenge lies in the lack of explicit citations, which hampers the ability to verify the information generated by these models.This paper focuses on providing models with citation capabilities efficiently. By constructing a dataset of citations, we train two model architectures: an FID-style FLAN-T5 model for efficient answer composition and a 13B model known for its success in instruction following after tuning. Evaluation on fluency, correctness, and citation quality is conducted through human assessment and the newly introduced Automatic LLMs’ Citation Evaluation (ALCE) benchmark.Results demonstrate significant improvements in answer quality and efficiency, surpassing the performance of the popular ChatGPT on some of the metrics. The models exhibit exceptional out-of-domain generalization in both human and automatic evaluation. Notably, the FID-style FLAN-T5 model with only 3B parameters performs impressively compared to the 13B model.