Abstract: Differentiable Search Indexing (DSI) is a recent paradigm for
information retrieval which uses a transformer-based neural network architecture as the document index to simplify the retrieval process. A differentiable index has many advantages enabling modifications, updates
or extensions to the index. In this work, we explore balancing relevance
and novel information content (diversity) for training DSI systems inspired by Maximal Marginal Relevance (MMR), and show the benefits
of our approach over the naive DSI training. We present quantitative and
qualitative evaluations of relevance and diversity measures obtained using our method on NQ320K and MSMARCO datasets in comparison to naive DSI. With our approach, it is possible to achieve diversity without
any significant impact to relevance. Since we induce diversity while training DSI, the trained model has learned to diversify while being relevant. This obviates the need for a post-processing step to induce diversity in
the recall set as typically performed using MMR. Our approach will be
useful for Information Retrieval problems where both relevance and diversity are important such as in sub-topic retrieval. Our work can also be easily be extended to the incremental DSI settings which would enable
fast updates to the index while retrieving a diverse recall set.
Loading