Scalable Range Search over Temporal and Numerical Expressions

Published: 07 Jun 2024, Last Modified: 07 Jun 2024ICTIR 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Temporal Expressions, Numerical Expressions, Indexing, Search, Efficiency, Analytics, Range Search
Abstract: Natural language expressions of time and numbers can be ambiguous (e.g., "2020s" can refer to either 2021 or 2025), can be present at different granularities, or can be unbounded (e.g., "more than ten percent"). To match and retrieve such ambiguous temporal and numerical expressions over millions of documents, we present NASH. Our experiments on collections amounting to more than 22 million documents show that NASH provides significant speedups in the order of 19.23 - 53.10x for contains and near queries. NASH manages this while using indexes that are 1.90 - 2.05x smaller than the indexes utilized by baselines. We further demonstrate NASH's scalability to the Web by indexing a subset of Common Crawl amounting to more than 365 million documents.
Submission Number: 9
Loading