Contradiction Retrieval via Contrastive Learning with Sparsity

Haike Xu; Zongyu Lin; Kai-Wei Chang; Yizhou Sun; Piotr Indyk

Contradiction Retrieval via Contrastive Learning with Sparsity

Haike Xu, Zongyu Lin, Kai-Wei Chang, Yizhou Sun, Piotr Indyk

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We design SparseCL, a novel approach for contradiction retrieval, which leverages specially trained sentence embeddings and a combined metric of cosine similarity and sparsity to efficiently identify documents that contradict a given query.

Abstract: Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query, which is important to many downstream applications like fact checking and data cleaning. To retrieve contradiction argument to the query from large document corpora, existing methods such as similarity search and cross-encoder models exhibit different limitations. To address these challenges, we introduce a novel approach: SparseCL that leverages specially trained sentence embeddings designed to preserve subtle, contradictory nuances between sentences. Our method utilizes a combined metric of cosine similarity and a sparsity function to efficiently identify and retrieve documents that contradict a given query. This approach dramatically enhances the speed of contradiction detection by reducing the need for exhaustive document comparisons to simple vector calculations. We conduct contradiction retrieval experiments on Arguana, MSMARCO, and HotpotQA, where our method produces an average improvement of $11.0\%$ across different models. We also validate our method on downstream tasks like natural language inference and cleaning corrupted corpora. This paper outlines a promising direction for non-similarity-based information retrieval which is currently underexplored.

Lay Summary: Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query, which is crucial for downstream applications such as fact-checking and data cleaning. To tackle these challenges, we introduce SparseCL, a novel approach that utilizes specially trained sentence embeddings designed to capture subtle, contradictory nuances between sentences. Our method combines cosine similarity with a sparsity function to efficiently identify and retrieve documents that contradict a given query. This approach dramatically enhances the speed of contradiction detection by reducing the need for exhaustive document comparisons to simple vector calculations. This paper outlines a promising direction for non-similarity-based information retrieval, which is currently underexplored.

Primary Area: Deep Learning->Other Representation Learning

Keywords: contradiction retrieval, sentence embedding

Submission Number: 14984

Loading