Data Mixing for Group Preference Heterogeneity in Collaborative Filtering

Published: 02 Jun 2026, Last Modified: 02 Jun 2026Pluralistic-Alignment 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: data mixing, homogenization, collaborative filtering, recommender systems
Abstract: Machine learning models often fail to capture the unique preferences of social groups. While much work has focused on model development to capture heterogeneity, we examine how the composition of training data, mediated by group-level data mixing, affects group-preference alignment. The central question is: given a target group, how does adding training data from augmentation groups impact model alignment with the target group's preferences? We examine preference alignment in the context of collaborative filtering. While it is difficult to specify the optimal data mix for generic prediction models, we show that in a matrix completion setting, the recovery-bound optimal mix minimizes the prevalence disparity among item classes. Strikingly, experiments on benchmark recommendation datasets reveal that optimizing the data mix does not reliably increase group alignment, because at standard embedding dimensions such as $d=64$, the differences among groups are insufficient to warrant data mixing. In very low-dimensional models, however, data mixing can leverage group differences to increase preference alignment. The experiments are consistent with our theoretical result showing that the augmentation groups most similar to the target are not necessarily the most beneficial for alignment.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 90
Loading