Abstract: Least-Mean-Squares (\textsc {LMS}) solvers comprise a class of fundamental optimization problems such as linear regression, and regularized regressions such as Ridge, LASSO, and Elastic-Net. Data summarization techniques for big data generate summaries called coresets and sketches to speed up model learning under streaming and distributed settings. For example,\citep {nips2019} design a fast and accurate Caratheodory set on input data to boost the performance of existing\textsc {LMS} solvers. In retrospect, we explore classical Householder transformation as a candidate for sketching and accurately solving LMS problems. We find it to be a simpler, memory-efficient, and faster alternative that always existed to the above strong baseline. We also present a scalable algorithm based on the construction of distributed Householder sketches to solve\textsc {LMS} problem across multiple worker nodes. We perform thorough empirical analysis with large synthetic and real datasets to evaluate the performance of Householder sketch and compare with\citep {nips2019}. Our results show Householder sketch speeds up existing\textsc {LMS} solvers in the scikit-learn library up to x- x. Also, it is x- x faster than the above baseline with similar numerical stability. The distributed algorithm demonstrates linear scalability with a near-negligible communication overhead.
0 Replies
Loading