Abstract
Pipelining is normally associated with shared memory and vector computers and rarely used as an algorithm design technique for distributed memory architectures. In this paper we show how pipelining enables communication and computation to be overlapped on a distributed memory parallel computer (128-processor T800 Parsytec SuperCluster) yielding a significant speedup. A linear solver based on Givens rotations is selected and parallelized using two different techniques. A non-overlapping algorithm using collective communication, such as optimized broadcast and collection, is compared with a pipelined (overlapping) algorithm using only simple point-to-point communications between neighbouring processors. Both algorithms use the same computational modules which have been identified and extracted from the sequential code.
Original language | British English |
---|---|
Pages (from-to) | 37-42 |
Number of pages | 6 |
Journal | Supercomputer |
Volume | 12 |
Issue number | 3 |
State | Published - Aug 1996 |