Abstract: This paper presents first successful steps in designing search agents that learn meta-strategies for iterative query refinement in information-seeking tasks. Our approach uses machine reading to guide the selection of refinement terms from aggregated search results. Agents are then empowered with simple but effective search operators to exert fine-grained and transparent control over queries and search results. We develop a novel way of generating synthetic search sessions, which leverages the power of transformer-based language models through (self-)supervised learning. We also present a reinforcement learning agent with dynamically constrained actions that learns interactive search strategies from scratch. Our search agents obtain retrieval and answer quality performance comparable to recent neural methods, using only a traditional term-based BM25 ranking function and interpretable discrete reranking and filtering actions.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: ## Revision 2 ### Main Paper * We updated Table 2 to include the PRF baseline results (RM3). * We updated Section 5.3 (Results) to include a discussion about the PRF baseline results. ### Appendix * We updated Table A.7 to include the PRF baseline results (RM3). * We added paragraph E.1 (Pseudo-Relevance Feedback Baselines) for an in-depth discussion about different PRF baselines. * We added Table A.8 showing results on NQ dev and test for the different PRF baselines. ## Revision 1 ### Main Paper * We simplify Figure 1. We present an actual example episode from the Rocchio policy. We describe in detail the full episode (including showing the retrieval results step-by-step) in Table A.8 in the Appendix. * We improve the structure and phrasing of Section 2, focusing on readability. * We add a descriptive example search session (Rocchio) in Table 1 to make the setup more graspable for the reader early on. * We unify the explanation of the MuZero and the T5 agent in Section 3 “Search Agents” * We add an extensive explanation of the Rocchio session data in Section 5.1. * We add further analysis of the Rocchio session data (Figure 2), showing the Rocchio sessions’ length for the different grammars and the score gain at each expansion step. * We update the caption and labels on Table 2 to improve clarity of the shown results. * We add an extensive caption describing the plots in Figure 3 (was Figure 2 in old paper). * We add Figure 4 which compares the step-wise performance metrics of the best Rocchio session data vs. the best learned T5 agent. * We extend the explanation of the results in Section 5.3. * We add a qualitative analysis of the T5-G1 (boosting only) vs. the T5-G4 (all operators) using an episode example in Table 4. * We extend the discussion of the experiments in Section 5.4, by the paragraphs “Limitations of Current Policies”, “Artificial vs Human Search Policies”, and “Thoughts on OpenQA-NQ”. * We improve the layout of all tables showing example search sessions. ### Appendix * We add Table A.1 that shows the length statistics of the different grammars of the Rocchio search sessions. * We add a computational complexity description of the state construction to Appendix B and of the MuZero and T5 agent to Appendix D. * We add paragraphs about the tuning details of the MuZero agent (Sec. D.1) and the T5 agent (Sec. D2)
Assigned Action Editor: ~Alessandro_Sordoni1
Submission Number: 11