Abstract: In this paper we study the parallelization of CGLS, a basic iterative method for large and sparse least squares problems whose main idea is to organize the computation of conjugate gradient method to normal equations. A performance model called isoefficiency concept is used to analyze the behavior of this method implemented on massively parallel distributed memory computers with two dimensional mesh communication scheme. Two different mappings of data to processors, namely simple stripe and cyclic stripe partitionings are compared by putting these communication times into the isoefficiency concept which models scalability aspects. Theoretically, the cyclic stripe partitioning is shown to be asymptotically more scalable.
Loading