Abstract: Search has become a key component of many on-line user experiences.
Search queries are usually textual and hence should benefit from improvements in
natural language processing. However, many of the NLP algorithms used in production
systems fail for queries that require structured understanding of the query and
document or that require reasoning. These issues arise because of the way information
is stored in the search index and the need to return results quickly. The issues
are exacerbated when searching over non-textual documents, including images and
structured data. The use of embedding-based techniques has helped with some types
of searches, especially when the query vocabulary does not match that of the documents
and when searching over images. However, these techniques still fail for many
searches, especially ones requiring reasoning. Simply combining classic word-level
search and embedding-based search does not solve these issues. Instead, in this position
paper, I argue that we need to create hybrid systems from traditional search
techniques, embedding-based search, and the addition of structured data and reasoning.
Enabling such hybrid systems will require a deep understanding of linguistic
representations of meaning, of information retrieval optimization, and of the types
of information encoded in the queries and documents. It is my hope that this paper
inspires further collaboration across disciplines to improve these complex search
problems.
0 Replies
Loading