Abstract: Entity retrieval---retrieving information about entities in a query---is a core step in open-domain tasks, such as question answering or fact checking. However, state-of-the-art entity retrievers struggle to retrieve rare entities in queries. There are two key challenges: (1) most retrievers are trained on unstructured text about entities and ignore structured data about entities that can be challenging to learn from text, such as entity types, and (2) methods that leverage structured types are not designed for end-to-end retrieval, which is necessary for open-domain tasks. In this work, we introduce a method, TABi, to jointly train bi-encoders on unstructured text and structured types for end-to-end retrieval. TABi uses a type-enforced contrastive loss to encode type information in the embedding space and trains over datasets from multiple open-domain tasks to learn to retrieve entities. We demonstrate that this simple method can improve retrieval of rare entities on the AmbER sets, while maintaining strong overall performance on retrieval for open-domain tasks when compared to state-of-the-art retrievers. We also find that TABi produces embeddings that better capture types on a nearest neighbor type classification and an entity similarity task.
0 Replies
Loading