Abstract: Quantitative information plays a crucial role in understanding and interpreting content of documents.
Many user queries contain quantities and cannot be resolved without understanding their semantics, e.g., ``car that costs less than $\$10$k''.
Yet, modern search engines apply the same ranking mechanisms for both words and quantities, overlooking magnitude and unit information.
In this paper, we introduce two quantity-aware ranking techniques designed to rank both the quantity and textual content either jointly or independently.
These techniques incorporate quantity information in available retrieval systems and can address
queries with numerical conditions equal, greater than, and less than.
To evaluate the effectiveness of our proposed models, we introduce two novel quantity-aware benchmark datasets in the domains of finance and medicine and compare our method against various lexical and neural models.
The code and data are available under \url{https://github.com/filled_in_later}.
Paper Type: long
Research Area: Information Retrieval and Text Mining
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Preprint Status: We are considering releasing a non-anonymous preprint in the next two months (i.e., during the reviewing process).
A1: yes
A1 Elaboration For Yes Or No: Section 6 - Limitations
A2: no
A2 Elaboration For Yes Or No: our work does not have potential malicious or unintended harmful effects and uses
A3: yes
A3 Elaboration For Yes Or No: Refer to Abstract and Section 1 - Introduction
B: yes
B1: yes
B1 Elaboration For Yes Or No: Section 3 - describes the model and Section 4.1 and C.1 the dataset.
B2: no
B2 Elaboration For Yes Or No: we publish the data and code and anyone can use it
B3: no
B3 Elaboration For Yes Or No: All the other models used in this work are open source under MIT licence
B4: no
B4 Elaboration For Yes Or No: The news articles are form prominent news websites and do not have data about individual, data from the TREC has already been anonymised by the creator
B5: yes
B5 Elaboration For Yes Or No: Section 4.1
B6: yes
B6 Elaboration For Yes Or No: Section 4.1 and C.3
C: no
C1: yes
C1 Elaboration For Yes Or No: Section C.3, we did not report number of parameters because we used known model that their parameters were shared in pervious work, we did not change those settings
C2: yes
C2 Elaboration For Yes Or No: Section C.3
C3: yes
C3 Elaboration For Yes Or No: Section 4 - we report single runs
C4: yes
C4 Elaboration For Yes Or No: Section C.3
D: yes
D1: no
D1 Elaboration For Yes Or No: The annotation guidelines is part of the data we uploaded and we explained the annotation process in Section C.1
D2: no
D2 Elaboration For Yes Or No: There was no payment, the authors of the paper performed the annotation
D3: no
D3 Elaboration For Yes Or No: The data we used is available on the news websites or it is open source by TREC
D4: no
D4 Elaboration For Yes Or No: This does not apply to us
D5: no
D5 Elaboration For Yes Or No: The data we used is available on the news websites or it is open source by TREC
E: no
E1: no
E1 Elaboration For Yes Or No: we did not use AI assistants
0 Replies
Loading