Abstract: Shared-memory systems such as regular desktops now possess enough memory to store large data. However, the training process for data classification can still be slow if we do not fully utilize the power of multi-core CPUs. Many existing works proposed parallel machine learning algorithms by modifying serial ones, but convergence analysis may be complicated. Instead, we do not modify machine learning algorithms, but consider those that can take the advantage of parallel matrix operations. We particularly investigate the use of parallel sparse matrix-vector multiplications in a Newton method for large scale logistic regression. Various implementations from easy to sophisticated ones are analyzed and compared. Results indicate that under suitable settings excellent speedup can be achieved.
Loading