Exploring Coresets for Efficient Training and Consistent Evaluation of Recommender Systems

Zheng Ju, Honghui Du, Elias Z. Tragos, Neil Hurley, Aonghus Lawlor

Published: 01 Jan 2024, Last Modified: 19 Feb 2025RecSys 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recommender systems have achieved remarkable success in various web applications, such as e-commerce, online advertising, and social media, harnessing the power of big data. To attain optimal model performance, recommender systems are typically trained on very large datasets, with substantial numbers of users and items. However, large datasets often present challenges in terms of processing time and computational resources. Coreset selection offers a method for obtaining a reduced yet representative subset from vast datasets, thereby enhancing the efficiency of training machine learning algorithms. Nevertheless, little research has been conducted to explore the practical implications of different coreset selection approaches on the performance of recommender systems algorithms. In this paper, we systematically investigate the impact of various coreset selection techniques. We evaluate the performance of the resulting coresets using inductive recommendation models which allow for consistent evaluations to be performed. The experimental results demonstrate that coreset methods are a powerful and useful approach for obtaining reduced datasets which preserve the properties of the large original dataset and have competitive performance compared to the time required to train with the full dataset.