Abstract: Recommending text documents presents significant challenges due to factors such as document length, semantic complexity, and domain specificity. Traditional content-based methods typically rely on either sparse or dense representations of full-text content, which can be computationally expensive. This paper proposes an alternative approach that represents documents based on the query terms used by multiple users (including the target user) to reach those documents during search. By focusing on the terms that led to interactions with documents on the search engine results page (SERP), we enhance the representational power of queries while reducing computational overhead. Our experiments, carried out on a large-scale dataset of legal documents from Jusbrasil, the largest online legal platform in Brazil, compare several dense (embedding-based) representations of full-text documents with sparse representations of both query terms and full document content. Evaluations were performed in two scenarios: next-item prediction and session continuation. The results show that using interaction-driven query terms for document representation yields competitive and, in some cases, superior performance compared to full-text representations.
External IDs:doi:10.1007/978-3-031-88714-7_6
Loading