Abstract: In the rapidly evolving domain of Recommender Systems (RecSys), new algorithms frequently claim state-of-the-art performance based
on evaluations over a limited set of arbitrarily selected datasets. However, this approach may fail to holistically reflect their effec-
tiveness due to the significant impact of dataset characteristics on algorithm performance. Addressing this deficiency, this paper
introduces a novel benchmarking methodology to facilitate a fair and robust comparison of RecSys algorithms, thereby advancing
evaluation practices. By utilizing a diverse set of 30 open datasets, including two introduced in this work, and evaluating 11 collabo-
rative filtering algorithms across 9 metrics, we critically examine the influence of dataset characteristics on algorithm performance.
We further investigate the feasibility of aggregating outcomes from multiple datasets into a unified ranking. Through rigorous experi-
mental analysis, we validate the reliability of our methodology under the variability of datasets, offering a benchmarking strategy that
balances quality and computational demands. This methodology enables a fair yet effective means of evaluating RecSys algorithms,
providing valuable guidance for future research endeavors.
Loading