"Robust statistical methods in web retrieval" by Carsten Eickhoff and Arjen P. de Vries, with Martin Vesely as coordinator
Abstract: Information retrieval systems rely on multitudes of individual features in order to determine the ranking of documents for a given user and query combination. Current solutions to this challenge are often inconsistent with the formal probabilistic framework in which constituent scores were estimated, or use sophisticated learning methods that make it difficult for humans to understand the origin of the final scores. To address these issues, we employ copulas, a family of robust statistical methods, introducing their formal background and empirically demonstrating their merit in a number of settings, including ranking, score fusion and language modelling.
Loading