Rethinking information retrieval in a re-decentralised web: exploring the feasibility and quality of search across personal online datastores
Abstract: Traditional information retrieval (IR) models, such as keyword-based and vector-based techniques, have long been used in centralized systems. However, the Web’s re-decentralization, with its focus on data ownership and privacy, calls for a re-evaluation of these methods in these settings. While standards for decentralized search enhance privacy to some extent, they also introduce computational overhead, black-box decision-making, and infrastructure complexity. Despite these challenges, traditional IR techniques remain largely unexplored in such environments. This paper presents an innovative application of traditional IR models in the decentralized Web by adapting them for Personal Online Data Stores (PODs), where search parties have varying access rights. We explore their role in source selection, document ranking, and result merging, extending them to meet decentralized search demands. Using Solid PODs and a synthetic medical dataset, we evaluate these models in a privacy-sensitive environment. Our findings demonstrate that extended IR methods provide an effective balance of performance, interpretability, and efficiency. These approaches hold strong potential as privacy-preserving alternatives for decentralized search on a re-decentralized Web. Notably, our top-performing model achieved competitive results in top-item retrieval compared to centralized search systems, maintaining high relevance scores under both limited and full data access conditions.
External IDs:doi:10.1145/3777445
Loading