Communication-Efficient Implementation of Block Recursive Algorithms on Distributed-Memory Machines

Published: 1994, Last Modified: 28 Jul 2025ICPADS 1994EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper presents a design methodology for developing efficient distributed-memory parallel programs for block-recursive algorithms such as the fast Fourier transform and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with circuit-switched or wormhole routed mesh or hypercube interconnection network. A mathematical framework based on the tenser product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tenser algebra. Performance results for FFT programs on the Intel iPSC/860 and Intel Paragon are presented.
Loading