Abstract
This paper provides a fairly comprehensive treatment of a broad class of algorithms as it pertains to systolic implementation. We describe some formal algorithmic transformations that can be utilized to map regular and some irregular compute-bound algorithms into the best-fit time-optimal systolic architectures. This methodology uses the concept of dependence vectors to order in time and space the index points representing the algorithm. However, by differentiating between two types of dependence vectors, the ordering procedure is allowed to be flexible and time optimal. Furthermore, the approach reported here deals with variable as well as fixed dependence vectors and does not put constraints on the topology or dimensionality of the target architecture. The ordered index points are represented by nodes in a diagram called Systolic Precedence Diagram (SPD). The SPD is transformed into a directed graph called the Systolic Directed Graph (SDG) which can be projected along defined directions to obtain the target architectures. If more than one valid projection direction exist, different designs are obtained. The resulting architectures. If more than one valid projection an improvement in the performance can be achieved by increasing PE fan-out. If so, the method provides the corresponding systolic implementation. The methodology has been tried on many signal processing, image processing, and graph theory algorithms and new arrays were designed as a result. In this paper, the methodology is illustrated by mapping three problems, namely, vector-matrix multiplication, matrix-matrix multiplication, and transitive closure problems into many planar and nonplanar time-optimal VLSI arrays.
Original language | British English |
---|---|
Pages (from-to) | 33-61 |
Number of pages | 29 |
Journal | Parallel Computing |
Volume | 19 |
Issue number | 1 |
DOIs | |
State | Published - Jan 1993 |
Keywords
- algorithmic transformations
- graph methodology for regularizing data flow
- mapping
- systolic architecture
- Systolic implementation
- time-optimal VLSI arrays