Low rank field-weighted factorization machines for low latency item recommendation

Alex Shtoff; Michael Viderman; Naama Krasne; Oren Somekh; Ariel Raviv

Low rank field-weighted factorization machines for low latency item recommendation

Alex Shtoff, Michael Viderman, Naama Krasne, Oren Somekh, Ariel Raviv

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: factorization machine, supervised learning, recommender system

TL;DR: Factorized field interaction matrix for faster inference with better accuracy than pruning the field interaction matrix.

Abstract: Factorization machine (FM) variants are widely used in item recommendation systems that operate under strict throughput and latency requirements, such as online advertising systems. Their main strength is their ability to model pairwise feature interactions while being resilient to data sparsity by learning factorized representation, and having computational graphs that allow fast inference. Moreover, when items are ranked as a part of a query for each incoming user, these graphs facilitate computing the portion stemming from the user and context fields only once per query, and the computational cost for each ranked item is proportional only to the number of fields that vary among the ranked items. Consequently, in terms of inference cost, the number of user or context fields is practically unlimited. More advanced variants of FMs, such as field-aware and field-weighted FMs, provide better accuracy by learning a representation of field-wise interactions, but require computing all pairwise interaction terms explicitly. In particular, the computational cost during inference is proportional to the square of the number of all fields, including user, context, and item. This is prohibitive in many systems when the number of fields is large, and imposes a limit on the number of user and context fields. To mitigate this caveat, heuristic pruning of low intensity field interactions is commonly used to accelerate inference. In this work we propose an alternative to the pruning heuristic in field-weighted FMs using a diagonal plus symmetric low-rank decomposition, that reduces the computational cost of inference, by allowing it to be proportional to the number of item fields only. Using a set of numerical experiments, we show that aggressive rank reduction outperforms similarly aggressive pruning, both in terms of accuracy and item recommendation speed. Beyond computational complexity analysis, we corroborate our claim of faster inference experimentally having deployed our solution to a major online advertising system, where we observed significant ranking latency improvements.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2261

Loading