Title page for 80325041


[Back to Results | New Search]

Student Number 80325041
Author Kuei-Ping Shih(۶Q)
Author's Email Address No Public.
Statistics This thesis had been viewed 296 times. Download 0 times.
Department Computer Science and Information Engineering
Year 1997
Semester 2
Degree Ph.D.
Type of Document Doctoral Dissertation
Language zh-TW.Big5 Chinese
Title Efficient Computation and Communication Sets Generations for Data-Parallel Programs on Multicomputers
Date of Defense
Page Count 174
Keyword
  • communication sets
  • computation sets
  • data-parallel languages
  • distributed-memory multicomputers
  • parallel processing
  • parallelizing compilers
  • Abstract This dissertation addresses the issues of generating
    computation andcommunication sets for data-parallel programs on
    distributed-memorymulticomputers. Data-parallel programs provide
    a global address space and data distribution directives for
    programmers to specify data distribution.A parallelizing
    compiler for data-parallel languages should
    distributecomputation onto processors by the owner-computes
    rule, allocate data across processors by user-specified data
    distribution directives, and managecommunication among
    processors for non-local data elements. In general, data-
    parallel languages use a two-level mapping for data to processor
    mapping.A two-level mapping provides user to specify the data-
    processor mapping byaligning related array objects with a
    template, and then distributing the template onto the user-
    declared abstract processors. Three regular data distributions,
    block, cyclic, and block-cyclic data distributions, are provided
    by data-parallel languages. The most general data distribution
    among the three distributions is the block-cyclic distribution.
    Hence, the dissertation proposes several compilation techniques
    to deal with the computation and communication sets generations
    for two-level mappings withblock-cyclic distribution. First of
    all, we present a scheme to efficiently generate the computation
    and communication sets for one-level mappings. A one-level
    mapping is a special case of a two-level mapping, which assumes
    that the array elements are identically aligned with the
    template. We use a class table to store the information that is
    extracted from the array statements and data distribution
    patterns. Given data distributions and an array statement, we
    can use theclass table to generate the communication sets in
    closed forms. Furthermore, we derive the SPMD codes for sending
    and receiving the necessary data between processors. An
    algorithm to generate the class table is presented. The time
    complexity of this algorithm is O(s), where s is the array
    access stride.The technique to generate communication sets for
    single induction variablehas been implemented on a DEC Alpha
    3000 workstation. The experimental results do confirm the
    advantages of our scheme, especially when the array access
    stride is larger than the distribution block size. We also
    extend our approach to a perfectly n-nested loop with general
    affine subscripts. The time complexity to construct all
    different class tables is O(s^2). In this dissertation, we also
    propose compilation techniques to compressholes, which are
    caused by the non-unit alignment stride in a two-leveldata-
    processor mapping. Holes are the memory locations mapped by
    useless template cells. To fully utilize the memory space,
    memory holes should be removed. In a two-level data-processor
    mapping, there is a repetitive pattern for array elements mapped
    onto processors. We classify blocks into classes and use a class
    table to record the distribution of each class in the first
    repetitive data distribution pattern. Similarly, data
    distribution on a processor also has a repetitive pattern. We
    use a compression table to record the distribution of each block
    in the first repetitive data distribution pattern on a
    processor. By using class table and compression table, hole
    compression can be easily and efficiently achieved. Compressing
    holes can save memory usage, improve spatial locality and
    further increase system performance.The proposed method is
    efficient, stable and easy implement. The experimental results
    do confirm the advantages of our proposed method over existing
    methods.Moreover, based on the premise of hole compression, the
    computation andcommunication sets generations for an array
    statement and data redistributionare presented as well. A more
    efficient compilation technique to generate the computation sets
    forblock-cyclically distributed array references with affine
    subscripts in atwo-nested loop is also proposed in this
    dissertation. For the memory accesses of an array reference with
    affine subscript within a two-nested loop, there exist
    repetitive patterns both at the outer and the inner loops. We
    use tables to record the memory accesses of repetitive patterns.
    According to these tables, a new start-computation algorithm is
    proposed to compute the starting elements on a processor for
    each outer loop iteration. The complexities of the table
    constructions are O(k+s2), where k is the distribution block
    size and s2 is the access stride for the inner loop. After
    tables are constructed, generating each starting element for
    each outer loop iteration can run in O(1) time. Moreover, we
    also show that the repetitive iterations for the outer loop are
    Pk/gcd(Pk, s1), where P is the number of processors and s1 is
    the access stride for the outer loop. Therefore, the total
    complexity to generate the computation sets for a block-
    cyclically distributed array with affine subscript in a two-
    nested loop is O(Pk/gcd(Pk, s1)+k+s2). The proposed approach is
    betterthan the known methods if s2 < Pk^2. In general, s2 is
    much smaller than Pk in real applications. Thus, the dominated
    term would be Pk. As a result, our proposed approach is much
    better than the existing methods. Moreover, the techniques
    proposed in the dissertation have been implementedand
    incorporated into our long-term project named UPPER (User-
    interactiveParallel Programming EnviRonment). This environment
    is designed and implementedat a DEC Alpha 3000 workstation with
    Motif environment. Given a data-parallel program (HPF, in this
    environment), we can generate the computation and communication
    sets as well as the SPMD code. The parallel execution platform
    is nCUBE/2. The generated SPMD code can be run on nCUBE/2 with
    user-specified number of processors. The performance analyses
    and comparisons are also given.
    Table of Content Cover
    Chinese Chapter
    Contents
    1 Introduction
    1.1 Motivations
    1.2 Previous Works
    1.3 Achievements and Contribution of the Dissertation
    1.4 Organization of the Dissertation
    2 Data-Parallel Languages Features
    2.1 The Basic Features of HPF
    2.2 Compiler Support
    3 Computation and Communication Sets Generations for One-Level Mappings
    3.1 Introduction
    3.2 Preliminaries
    3.3 Compilation of Array References
    3.4 Experimental Results
    3.5 Summary
    4 Computation and Communication Sets Generations for Two-Level Mappings
    4.1 Introduction
    4.2 Characteristics of Class Table
    4.3 Hole Compression
    4.4 Computation and Communication Sets Generations for Assignment Satatement and Data Redistribution
    4.5 Experimental Results
    4.6 Related Works
    4.7 Summary
    5 Computation Sets Generations for a Two-Nested Loo- with Affine Subscripts in One-Level Mappings
    5.1 Introduction
    5.2 Address Generation for Affine Subscripts
    5.3 Generating Starting Elements for s > k
    5.4 Tables Constructions
    5.5 Performance Analyses and Comparisons
    5.6 Summary
    6 Implementation of a Communication-Efficient Data-Parallel Programs Compiling System
    6.1 Introduction
    6.2 System Overview
    6.3 Implementation Issues
    6.4 Summary
    7 Conclusions
    7.1 Summary of Contributions
    7.2 Future Works
    Biliography
    Reference
    Advisor
  • Jang-Ping Sheu(\)
  • Files No Any Full Text File.
    Date of Submission

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have dissertation-related questions, please contact with the NCU library extension service section.
    Our service phone is (03)422-7151 Ext. 57407,E-mail is also welcomed.