Multi-Task Retrieval-Augmented Text Generation with Relevance SamplingDownload PDF

01 Jun 2022 (modified: 05 May 2023)ICML 2022 Workshop KRLM Readers: Everyone
Keywords: retrieval augmented generation, KILT, Fusion-in-Decoder
TL;DR: We propose relevance confidence filtering for multi-task training of the FiD model on the KILT benchmark.
Abstract: This paper studies multi-task training of retrieval-augmented generation models for knowledge-intensive tasks. We propose to clean the training set by utilizing a distinct property of knowledge-intensive generation: The connection of query-answer pairs to items in the knowledge base. We filter training examples via a threshold of confidence on the relevance labels, whether a pair is answerable by the knowledge base or not.We train a single Fusion-in-Decoder (FiD) generator on seven combined tasks of the KILT benchmark. The experimental results suggest that our simple yet effective approach substantially improves competitive baselines on two strongly imbalanced tasks; and shows either smaller improvements or no significant regression on the remaining tasks. Furthermore, we demonstrate our multi-task training with relevance label sampling scales well with increased model capacity and achieves state-of-the-art results in five out of seven KILT tasks.
0 Replies

Loading