TABi: Type-Aware Bi-encoders for End-to-End Entity RetrievalDownload PDF


16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Entity retrieval---retrieving information about entities in a query---is a core step in open-domain tasks, such as question answering or fact checking. However, state-of-the-art entity retrievers struggle to retrieve rare entities in queries. There are two key challenges: (1) most retrievers are trained on unstructured text about entities and ignore structured data about entities that can be challenging to learn from text, such as entity types, and (2) methods that leverage structured types are not designed for end-to-end retrieval, which is necessary for open-domain tasks. In this work, we introduce a method, TABi, to jointly train bi-encoders on unstructured text and structured types for end-to-end retrieval. TABi uses a type-enforced contrastive loss to encode type information in the embedding space and trains over datasets from multiple open-domain tasks to learn to retrieve entities. We demonstrate that this simple method can improve retrieval of rare entities on the AmbER sets, while maintaining strong overall performance on retrieval for open-domain tasks when compared to state-of-the-art retrievers. We also find that TABi produces embeddings that better capture types on a nearest neighbor type classification and an entity similarity task.
0 Replies
