Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics
Abstract: Recent work on language models often applies reinforcement learning with human-annotated preference data to enhance specific capabilities, such as generating informative summaries.
However, such data often focuses on overall preferences and overlooks factuality.
Since collecting new annotations is costly, we propose to use automatic factuality metrics to obtain factuality preference labels.
While individual factuality metrics are limited, their combination can effectively capture diverse factual errors.
We introduce an automated training pipeline that improves summarisation factuality via preference optimisation.
For each source document, we generate lexically similar summary pairs by varying decoding strategies, ensuring the model learns from minor factual errors.
To avoid human annotation, we derive preference labels from weak factuality metrics filtering out conflicting cases to improve reliability.
This results in a high-quality preference dataset constructed with only source documents.
Experiments show consistent factuality gains across models, ranging from early encoder-decoder architectures to modern large language models, with smaller models reaching comparable factuality to larger ones.
Code and data will be released upon acceptance.
Paper Type: Long
Research Area: Summarization
Research Area Keywords: abstractive summarisation, factuality
Contribution Types: Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 3227
Loading