Matrix Factorization of Large Scale Data Using Block Based Approach and GPU Acceleration

Published: 2024, Last Modified: 27 Jan 2026BDA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Matrix Factorization (MF) on large scale data takes substantial time on a Central Processing Unit (CPU). While Graphics Processing Unit (GPU)s could expedite the computation of MF, the available memory on a GPU is finite. Leveraging GPUs require alternative techniques that allow not only parallelism but also address memory limitations. Synchronization of data between CPU and GPUs is one challenge that arises while leveraging GPUs for MF. Similalry, isolation of data related to a specific computational unit so as to avoid race conditions while ensuring sharing of data between computational units is yet another challenge. Currently available matrix factorization models only take advantage of the GPU(s) for independent vector operations which require multiple back and forth data transfers between computational units thereby leading to only fractional utilization of GPUs. The existing works are also limited to approximate computation due to race conditions while updating the matrices. We propose a novel approach to matrix factorization using a block based technique that leverages GPUs. The proposed method addresses factorization of large scale data by identifying independent blocks, each of which are factorized in parallel using multiple computational units. The approach can be extended to one or more GPUs and even to distributed systems. The RMSE results of the block based approach are within acceptable delta in comparison to the results of CPU based variant and multi-threaded CPU variant of similar Stochastic Gradient Descent (SGD) kernel implementation. The advantage, of the block based variant, in-terms of speed is significant in comparison to other variants.
Loading