Abstract: Contextual dense representation models for text marked a shift in text processing, enabling a richer semantic understanding of the text and more effective Information Retrieval. These models project pieces of text into a latent space, describing them in terms of shared latent concepts, which are not explicitly tied to the text's content. Previous work has shown that certain dimensions of such dense text representations can be irrelevant and detrimental to retrieval effectiveness depending on the information need specified in the query. Higher effectiveness can be achieved by performing retrieval within a linear subspace that excludes these dimensions. Dimension IMportance Estimators (DIMEs) are models designed to identify such harmful dimensions, refining the representations of queries and documents to retain only the useful ones. Current DIMEs rely either on pseudo-relevance feedback, which often delivers inconsistent effectiveness, or on explicit relevance feedback, which is challenging to collect. Inspired by counterfactual modelling, we introduce Counterfactual DIMEs (CoDIMEs), designed to leverage noisy implicit feedback to assess the importance of each dimension. The CoDIME framework presented here approximates the relationship between a document's click frequency and its interaction with a given query dimension through a linear model. Empirical evaluations demonstrate that CoDIME outperforms traditional pseudo-relevance feedback-based DIMEs and surpasses other unsupervised counterfactual methods that utilize implicit feedback.
Loading