Abstract: Passage retrieval is a part of fact-checking and question answering systems that is critical yet often neglected. Most systems usually rely only on traditional sparse retrieval. This can have a significant impact on the recall, especially when the relevant passages have few overlapping words with the query sentence. Recent approaches have attempted to learn dense representations of queries and passages to better capture the latent semantic content of text. While dense retrieval models have been proven effective in question answering, there is no relevant work for improving evidence retrieval in fact-checking. In this work, we show that training a dense retriever is sufficient to outperform traditional sparse representations in both question answering and fact-checking. We constructed a new dataset called Factual-NLI, comprised of factual claims and their supporting evidence, and demonstrate that using it to train a dense retriever can improve evidence retrieval significantly. Experimental results on the MSMARCO dataset indicate that pre-training with Factual-NLI, and other NLI datasets, is also effective for large-scale passage retrieval in question answering. Our model is incorporated in a real world semantic search engine that returns snippets containing evidence related to questions and claims about the COVID-19 pandemic.
0 Replies
Loading