Keywords: retrieval, generation, retrieval-augmented generation, open-ended generation, informative conversations, free-form QA, posterior distribution, ELBo
Abstract: Many text generation systems benefit from retrieving passages from a textual knowledge corpus (e.g., Wikipedia) and using them to generate the output. For open-ended generation tasks, like generating informative utterances in conversations, many varied passages $z$ are relevant to the context $x$ but few are relevant to the observed next utterance $y$ (label). For such tasks, existing methods (that jointly train the retriever and generator) underperform: during training the top-k context-relevant retrieved passages might not contain the label-relevant passage and the generator may hence not learn a preference to ground its generated output in them. We propose using an additional guide-retriever that also conditions on the observed label $y$ and “in hindsight” retrieves label-relevant passages during training. We maximize the evidence lower bound (ELBo) to jointly train the guide-retriever $Q(z|x,y)$ with the standard retriever $P_\eta(z|x)$ and the generator $P_\theta(y|x,z)$ and find that ELBo has better inductive biases than prior work. For informative conversations from the Wizard of Wikipedia dataset, with our posterior-guided training, the retriever finds passages with higher relevance in the top-10 (23% relative improvement), the generator’s responses are more grounded in the retrieved passage (19% relative improvement) and the end-to-end system produces better overall output (6.4% relative improvement).
One-sentence Summary: We use a posterior-guide retriever to train a retrieval-augmented generation that performs well on open-ended one-to-many generation tasks.