[Back to Results | New Search]

Student Number 80325041 Author Kuei-Ping Shih(¥Û¶Q¥) Author's Email Address No Public. Statistics This thesis had been viewed 335 times. Download 11 times. Department Computer Science and Information Engineering Year 1997 Semester 2 Degree Ph.D. Type of Document Doctoral Dissertation Language zh-TW.Big5 Chinese Title Efficient Computation and Communication Sets Generations for Data-Parallel Programs on Multicomputers Date of Defense Page Count 174 Keyword communication sets computation sets data-parallel languages distributed-memory multicomputers parallel processing parallelizing compilers Abstract This dissertation addresses the issues of generating

computation andcommunication sets for data-parallel programs on

distributed-memorymulticomputers. Data-parallel programs provide

a global address space and data distribution directives for

programmers to specify data distribution.A parallelizing

compiler for data-parallel languages should

distributecomputation onto processors by the owner-computes

rule, allocate data across processors by user-specified data

distribution directives, and managecommunication among

processors for non-local data elements. In general, data-

parallel languages use a two-level mapping for data to processor

mapping.A two-level mapping provides user to specify the data-

processor mapping byaligning related array objects with a

template, and then distributing the template onto the user-

declared abstract processors. Three regular data distributions,

block, cyclic, and block-cyclic data distributions, are provided

by data-parallel languages. The most general data distribution

among the three distributions is the block-cyclic distribution.

Hence, the dissertation proposes several compilation techniques

to deal with the computation and communication sets generations

for two-level mappings withblock-cyclic distribution. First of

all, we present a scheme to efficiently generate the computation

and communication sets for one-level mappings. A one-level

mapping is a special case of a two-level mapping, which assumes

that the array elements are identically aligned with the

template. We use a class table to store the information that is

extracted from the array statements and data distribution

patterns. Given data distributions and an array statement, we

can use theclass table to generate the communication sets in

closed forms. Furthermore, we derive the SPMD codes for sending

and receiving the necessary data between processors. An

algorithm to generate the class table is presented. The time

complexity of this algorithm is O(s), where s is the array

access stride.The technique to generate communication sets for

single induction variablehas been implemented on a DEC Alpha

3000 workstation. The experimental results do confirm the

advantages of our scheme, especially when the array access

stride is larger than the distribution block size. We also

extend our approach to a perfectly n-nested loop with general

affine subscripts. The time complexity to construct all

different class tables is O(s^2). In this dissertation, we also

propose compilation techniques to compressholes, which are

caused by the non-unit alignment stride in a two-leveldata-

processor mapping. Holes are the memory locations mapped by

useless template cells. To fully utilize the memory space,

memory holes should be removed. In a two-level data-processor

mapping, there is a repetitive pattern for array elements mapped

onto processors. We classify blocks into classes and use a class

table to record the distribution of each class in the first

repetitive data distribution pattern. Similarly, data

distribution on a processor also has a repetitive pattern. We

use a compression table to record the distribution of each block

in the first repetitive data distribution pattern on a

processor. By using class table and compression table, hole

compression can be easily and efficiently achieved. Compressing

holes can save memory usage, improve spatial locality and

further increase system performance.The proposed method is

efficient, stable and easy implement. The experimental results

do confirm the advantages of our proposed method over existing

methods.Moreover, based on the premise of hole compression, the

computation andcommunication sets generations for an array

statement and data redistributionare presented as well. A more

efficient compilation technique to generate the computation sets

forblock-cyclically distributed array references with affine

subscripts in atwo-nested loop is also proposed in this

dissertation. For the memory accesses of an array reference with

affine subscript within a two-nested loop, there exist

repetitive patterns both at the outer and the inner loops. We

use tables to record the memory accesses of repetitive patterns.

According to these tables, a new start-computation algorithm is

proposed to compute the starting elements on a processor for

each outer loop iteration. The complexities of the table

constructions are O(k+s2), where k is the distribution block

size and s2 is the access stride for the inner loop. After

tables are constructed, generating each starting element for

each outer loop iteration can run in O(1) time. Moreover, we

also show that the repetitive iterations for the outer loop are

Pk/gcd(Pk, s1), where P is the number of processors and s1 is

the access stride for the outer loop. Therefore, the total

complexity to generate the computation sets for a block-

cyclically distributed array with affine subscript in a two-

nested loop is O(Pk/gcd(Pk, s1)+k+s2). The proposed approach is

betterthan the known methods if s2 < Pk^2. In general, s2 is

much smaller than Pk in real applications. Thus, the dominated

term would be Pk. As a result, our proposed approach is much

better than the existing methods. Moreover, the techniques

proposed in the dissertation have been implementedand

incorporated into our long-term project named UPPER (User-

interactiveParallel Programming EnviRonment). This environment

is designed and implementedat a DEC Alpha 3000 workstation with

Motif environment. Given a data-parallel program (HPF, in this

environment), we can generate the computation and communication

sets as well as the SPMD code. The parallel execution platform

is nCUBE/2. The generated SPMD code can be run on nCUBE/2 with

user-specified number of processors. The performance analyses

and comparisons are also given.Table of Content Cover

Chinese Chapter

Contents

1 Introduction

1.1 Motivations

1.2 Previous Works

1.3 Achievements and Contribution of the Dissertation

1.4 Organization of the Dissertation

2 Data-Parallel Languages Features

2.1 The Basic Features of HPF

2.2 Compiler Support

3 Computation and Communication Sets Generations for One-Level Mappings

3.1 Introduction

3.2 Preliminaries

3.3 Compilation of Array References

3.4 Experimental Results

3.5 Summary

4 Computation and Communication Sets Generations for Two-Level Mappings

4.1 Introduction

4.2 Characteristics of Class Table

4.3 Hole Compression

4.4 Computation and Communication Sets Generations for Assignment Satatement and Data Redistribution

4.5 Experimental Results

4.6 Related Works

4.7 Summary

5 Computation Sets Generations for a Two-Nested Loo- with Affine Subscripts in One-Level Mappings

5.1 Introduction

5.2 Address Generation for Affine Subscripts

5.3 Generating Starting Elements for s > k

5.4 Tables Constructions

5.5 Performance Analyses and Comparisons

5.6 Summary

6 Implementation of a Communication-Efficient Data-Parallel Programs Compiling System

6.1 Introduction

6.2 System Overview

6.3 Implementation Issues

6.4 Summary

7 Conclusions

7.1 Summary of Contributions

7.2 Future Works

BiliographyReference Advisor Jang-Ping Sheu(³\°·¥)

Files No Any Full Text File. Date of Submission

Our service phone is (03)422-7151 Ext. 57407,E-mail is also welcomed.